Compromised fault tolerance, Procedure to attempt recovery – HP 5300 User Manual

Page 121

Advertising
background image

Hard Drive Installation and Replacement

E-4

HP Smart Array 5300 Controller User Guide

HP CONFIDENTIAL

Writer: Jennifer Hayward File Name: o-appe hard drive installation and replacement

Codename: SilverHammer Part Number: 135606-005 Last Saved On: 10/8/02 11:22 AM

There are several other ways to recognize that a hard drive has failed:

• The amber LED lights up on the front of a storage system if failed drives are

inside. (Other problems such as fan failure, redundant power supply failure, or
over-temperature conditions will also cause this LED to light up.)

• A Power-On Self-Test (POST) message lists failed drives whenever the system is

restarted, as long as the controller detects one or more good drives. Refer to
Appendix G for an explanation of POST messages.

• The Array Diagnostic Utility (ADU) lists all failed drives.

Also, Insight Manager can detect failed drives remotely across a network.

For additional information about hard drive problems, refer to the HP Servers
Troubleshooting Guide
.

Compromised Fault Tolerance

Compromised fault tolerance commonly occurs when more physical drives have
failed than the fault-tolerance method can endure. In this case, the logical volume is
failed and unrecoverable disk error messages are returned to the host. Data loss is
likely to occur.

An example of this situation is where one drive on an array fails while another drive
in the same array is still being rebuilt. If the array has no online spare, any logical
drives on the array that are configured with RAID 5 fault tolerance will fail.

Compromised fault tolerance may also be caused by non-drive problems, such as
temporary power loss to a storage system or a faulty cable. In such cases, the physical
drives do not need to be replaced. However, data may still have been lost, especially
if the system was busy at the time that the problem occurred.

Procedure to Attempt Recovery

When fault tolerance has been compromised, inserting replacement drives does not
improve the condition of the logical volume. Instead, if your screen displays
unrecoverable error messages, try the following procedure to recover data.

Advertising