Troubleshooting hardware problems affecting pairs, Troubleshooting with raid manager – HP XP P9500 Storage User Manual

Page 117

Advertising
background image

Troubleshooting hardware problems affecting pairs

Hardware failures affecting Continuous Access Journal Z are described in the following table.
Note also that, in addition to the problems described below, hardware failures that affect cache
memory and shared memory can cause the pairs to be suspended.

Recovery procedure

SIM

Causes of suspension

Classification

Depending on the SIM, remove the hardware
blockade or failure.

DC0x
DC1x
DC2x

Hardware redundancy has been lost due
to some blockade condition. As a result,
one of the following could not complete:

Primary or secondary
system hardware

If a failure occurs when Business Continuity
Manager is being used, secondary volumes in

primary-secondary system communication,
journal creation, copy operation, restore

Suspend (equivalent to SWAPPING in Business

operation, staging process, or de-staging
process.

Continuity Manager) may remain in the master
journal. If these volumes remain, execute the

Journals cannot be retained because some
portion of the cache memory or the

YKRESYNC REVERSE option on the secondary
volumes with a pair status of Suspend, which

shared memory has been blocked due to
hardware failure.

is equivalent to SWAPPING in Business
Continuity Manager terminology (YKRESYNC
is the Business Continuity Manager command

The primary system failed to create and
transfer journals due to unrecoverable
hardware failure.

for resynchronizing a pair).This operation
changes all volumes in the master journal to
primary volumes. After this operation, restore
the volume pairs (Resume Pair).

The secondary system failed to receive
and restore journals due to unrecoverable
hardware failure.

The drive parity group was in
correction-access status while the Cnt Ac-J
Zpair was in Pending Duplex status.

Remove the failure from the primary and
secondary systems or the network relay
devices.

DC0x

DC1x

Communication between the systems
failed because the secondary system or
network relay devices were not running.

Communication between
the primary and
secondary systems

If necessary, increase resources as needed (for
example, the amount of cache, the number of

Journal volumes remained full even after
the timeout period elapsed.

paths between primary and secondary systems,
the parity groups for journal volumes, etc.).

Restore the failed pairs (Resume Pair).

Release failed pairs (Delete Pair).

DC2x

An unrecoverable RIO (remote I/O)
timeout occurred because the system or

RIO overload or RIO
failure

If necessary, increase resources as needed (for
example, the amount of cache, the number of

network relay devices were overloaded.
Or, RIO could not be finished due to a
failure in the system.

paths between primary and , the parity groups
for journal volumes, etc.).

Re-establish failed pairs (Add Pair).

No recovery procedure is required. The
primary system automatically removes the

DC8x

The Cnt Ac-J Z pairs were temporarily
suspended due to a planned power
outage to the primary system.

Planned power outage
to the primary system

suspension condition when the system is
powered on.

Troubleshooting with RAID Manager

When an error has occurred in Continuous Access Journal Z pair operation when using RAID
Manager, you can identify the cause of the error by referring to the RAID Manager operation log
file. The file is stored in the following directory by default:

/HORCM/log*/curlog/horcmlog_HOST/horcm.log

Where:

* is the instance number.

HOST is the host name.

Troubleshooting hardware problems affecting pairs

117

Advertising