Automatic data recovery failure, Compromised fault tolerance – HP Compaq Integrated Smart Array Controller User Manual

Page 131

Advertising
background image

Understanding Drive Arrays B-25

Compaq Confidential – Need to Know Required

Writer: CDresden Project: Compaq Integrated Smart Array Controller User Guide Comments:

Part Number: 153236-001 File Name: i-appb Understanding Drive Arrays.doc Last Saved On: 8/27/99 11:35 AM

In general, the time required for a rebuild is approximately 15 minutes per
gigabyte. The actual rebuild time, however, is dependent upon the Rebuild
Priority set, the amount of I/O activity occurring during the rebuild operation,
the number of drives in the array (RAID 5) and the disk drive speed.

Automatic Data Recovery Failure

During Automatic Data Recovery, if the online LED of the replacement drive
stops blinking and all other drives in the array are still online, the Automatic
Data Recovery process may have been terminated abnormally because of a
noncorrectable read error from another physical drive during the recovery
process. The background Auto-Reliability Monitoring process is intended to
help prevent this problem, but it cannot do anything about certain issues, such
as SCSI bus signal integrity problems. Restart the system. A POST message
will confirm the diagnosis. Retrying Automatic Data Recovery may help. If
the retry does not help, the recommended course of action is a backup of all
data on the system, a surface analysis (with user diagnostics tools), and a full
restore.

During Automatic Data Recovery, if the online LED of the replacement drive
stops blinking and the replacement drive is failed (amber failure LED
illuminated or other LEDs go out), the replacement drive is producing
unrecoverable disk errors. In this case, the replacement drive should be
removed and replaced with another replacement drive.

Compromised Fault Tolerance

If fault tolerance is ever compromised because of failure of multiple drives,
the condition of the logical drive will be “failed” and “unrecoverable” errors
will be returned to the host. Data loss is probable. Insertion of replacement
drives at this time will not improve the condition of the logical drive. If this
occurs, first try turning the entire system off and on. In some cases an
intermittent drive will appear to work again (perhaps long enough to make
copies of important files) after cycling power. If a 1779 POST message
displays, press F2 to reenable the logical drive(s). Remember that data loss has
likely occurred and any data on the logical drive is suspect.

Fault tolerance may be compromised because of nondrive problems such as a
faulty cable, faulty storage system power supply, or a user accidentally turning
off an external storage system while the host system power was on. In such
cases, obviously the physical drives do not need to be replaced. However, data
loss can still occur in this situation, especially if the system was busy at the
time the problem developed.

Advertising