Troubleshooting hardware problems affecting pairs, Troubleshooting with raid manager – HP XP P9500 Storage User Manual

Page 108

Advertising
background image

Troubleshooting hardware problems affecting pairs

Hardware failures affecting Continuous Access Journal are described in the following table. Note
also that, in addition to the problems described below, hardware failures that affect cache memory
and shared memory can cause the pairs to be suspended.

Recovery procedure

SIM

Causes of suspension

Classification

Depending on the SIM, remove the hardware
blockade or failure.

DC0x
DC1x
DC2x

Hardware redundancy has been lost due
to some blockade condition. As a result,
one of the following could not complete:

Primary or secondary
system hardware

Restore the failed volume pairs (pairresync) .

primary-secondary system communication,

If a failure occurs during execution of the RAID
Manager horctakeover command, secondary

journal creation, copy operation, restore
operation, staging process, or de-staging
process.

volumes in SSWS pair status may remain in
the master journal . If these volumes remain,

Journals cannot be retained because some
portion of the cache memory or the

execute the pairresync -swaps command on
the secondary volumes whose pair status is

shared memory has been blocked due to
hardware failure.

SSWS (pairresync is the RAID Manager
command for resynchronizing pair and -swaps

The primary system failed to create and
transfer journals due to unrecoverable
hardware failure.

is a swap option) .This operation changes all
volumes in the master journal to primary
volumes. After this operation, restore the
volume pairs (pairresync).

The secondary system failed to receive
and restore journals due to unrecoverable
hardware failure.

The drive parity group was in
correction-access status while the Cnt
Ac-Jpair was in COPY status.

Remove the failure from the primary and
secondary systems or the network relay
devices.

DC0x

DC1x

Communication between the systems
failed because the secondary system or
network relay devices were not running.

Communication between
the primary and
secondary systems

If necessary, increase resources as needed (for
example, the amount of cache, the number of

Journal volumes remained full even after
the timeout period elapsed.

paths between primary and secondary systems,
the parity groups for journal volumes, etc.).

Restore the failed pairs (pairresync).

Release failed pairs (pairsplit-S) .

DC2x

An unrecoverable RIO (remote I/O)
timeout occurred because the system or

RIO overload or RIO
failure

If necessary, increase resources as needed (for
example, the amount of cache, the number of

network relay devices were overloaded.
Or, RIO could not be finished due to a
failure in the system.

paths between primary and secondary system,
the parity groups for journal volumes, etc.).

Re-establish failed pairs (paircreate).

No recovery procedure is required. The
primary system automatically removes the

DC8x

The Cnt Ac-J pairs were temporarily
suspended due to a planned power
outage to the primary system.

Planned power outage
to the primary system

suspension condition when the system is
powered on.

Troubleshooting with RAID Manager

When an error has occurred in Continuous Access Journal pair operation when using RAID
Manager, you can identify the cause of the error by referring to the RAID Manager operation log
file. The file is stored in the following directory by default:

/HORCM/log*/curlog/horcmlog_HOST/horcm.log

Where:

* is the instance number.

HOST is the host name.

108 Troubleshooting

Advertising