6 disaster recovery, The basic recovery process, System failure messages – HP XP Racks User Manual

Page 51: Detecting failures, Option 1: check for failover first

Advertising
background image

6 Disaster recovery

On-site disasters, such as power supply failures, can disrupt the normal operation of your ESAM
system. Being able to quickly identify the type of failure and recover the affected system or
component helps to ensure that you can restore high-availability protection for host applications
as soon as possible.

Main types of failures that can disrupt your system

The main types of failures that can disrupt the system are power failures, hardware failures,
connection or communication failures, and software failures. These types of failures can cause
system components to function improperly or stop functioning.

System components typically affected by these types of failures include:

Main control unit (primary storage system)

Service processor (primary or secondary storage system)

Remote control unit (secondary storage system)

Volume pairs

Quorum disks

The basic recovery process

The basic process for recovering from an on-site disaster is the same, regardless of the type of
failure that caused the disruption in the system. The recovery process involves:

Detecting failures

Determining the type of failure

Determining which recovery procedure to use

Completing the recovery procedure.

System failure messages

The system automatically generates messages that you can use to detect failures and determine
the type of failure that occurred. The messages contain information about the type of failure.

Generated by the primary and secondary storage systems

System information messages (SIM)

Generated by the multipath software on the host

Path failure messages

Detecting failures

Detecting failures is the first task in the recovery process. Failure detection is essential because you
need to know the type of failure before you can determine which recovery procedure to use.

You have two options for detecting failures. You can check to see if failover has occurred and then
determine the type of failure that caused it, or you can check to see if failures have occurred by
using the SIM and path failure system messages.

“Option 1: Check for failover first” (page 51)

“Option 2: Check for failures only” (page 52)

Option 1: Check for failover first

You can use status information about the secondary volume and path status information to see if
failover occurred. You can do this using RWC, RAID Manager, or multipath software.

Main types of failures that can disrupt your system

51

Advertising
This manual is related to the following products: