Compromised fault tolerance, Recovering from compromised fault tolerance, Replacing drives – HP Smart Array P731m Controller User Manual

Page 23

Advertising
background image

Drive procedures 23

Compromised fault tolerance

CAUTION:

When fault tolerance is compromised, data loss can occur. However, it may be

possible to recover the data. For more information, see "Recovering from compromised fault

tolerance (on page

23

)."

If more drives fail than the fault-tolerance method can manage, fault tolerance is compromised, and the

logical drive fails. If this failure occurs, the operating system rejects all requests and indicates unrecoverable
errors.
For example, fault tolerance might occur when a drive in an array fails while another drive in the array is

being rebuilt.
Compromised fault tolerance can also be caused by problems unrelated to drives. In such cases, replacing

the physical drives is not required.

Recovering from compromised fault tolerance

If fault tolerance is compromised, inserting replacement drives does not improve the condition of the logical

volume. Instead, if the screen displays unrecoverable error messages, perform the following procedure to
recover data:

1.

Power down the entire system, and then power it back up. In some cases, a marginal drive will work
again for long enough to enable you to make copies of important files.
If a 1779 POST message is displayed, press the F2 key to re-enable the logical volumes. Remember that
data loss has probably occurred and any data on the logical volume is suspect.

2.

Make copies of important data, if possible.

3.

Replace any failed drives.

4.

After you have replaced the failed drives, fault tolerance may again be compromised. If so, cycle the
power again. If the 1779 POST message is displayed:

a.

Press the F2 key to re-enable the logical drives.

b.

Recreate the partitions.

c.

Restore all data from backup.

To minimize the risk of data loss that is caused by compromised fault tolerance, make frequent backups of all

logical volumes.

Replacing drives

The most common reason for replacing a drive is that it has failed. However, another reason is to gradually

increase the storage capacity of the entire system.
For systems that support hot-pluggable drives, if you replace a failed drive that belongs to a fault-tolerant

configuration while the system power is on, all drive activity in the array pauses for 1 or 2 seconds while the
new drive is initializing. When the drive is ready, data recovery to the replacement drive begins

automatically.
If you replace a drive belonging to a fault-tolerant configuration while the system power is off, a POST

message appears when the system is next powered up. This message prompts you to press the F1 key to start
automatic data recovery. If you do not enable automatic data recovery, the logical volume remains in a

ready-to-recover condition and the same POST message appears whenever the system is restarted.

Advertising