Effects of a hard drive failure, Compromised fault tolerance, Recovering from compromised fault tolerance – HP 60-Modular-Smart-Array User Manual

Page 30: Factors to consider before replacing hard drives

Advertising
background image

Troubleshooting 30

For additional information about diagnosing hard drive problems, see the HP ProLiant Servers

Troubleshooting Guide.

CAUTION: Sometimes, a drive that has previously failed may seem to be operational after the system is

power-cycled or (for a hot-pluggable drive) after the drive has been removed and reinserted. However,

continued use of such marginal drives may eventually result in data loss. Replace the marginal drive as soon

as possible.

Effects of a hard drive failure

When a hard drive fails, all logical drives that are in the same array are affected. Each logical drive in

an array may be using a different fault-tolerance method, so each logical drive can be affected

differently.

RAID 0 configurations cannot tolerate drive failure. If any physical drive in the array fails, all non-

fault-tolerant (RAID 0) logical drives in the same array will also fail.

RAID 1+0 configurations can tolerate multiple drive failures as long as no failed drives are mirrored

to one another (with no spares assigned).

RAID 5 configurations can tolerate one drive failure (with no spares assigned).

RAID 6 with ADG configurations can tolerate simultaneous failure of two drives (with no spares

assigned).

Compromised fault tolerance

If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised, and the

logical drive fails. In this case, all requests from the operating system are rejected with unrecoverable

errors. You are likely to lose data, although it can sometimes be recovered.
One example of a situation in which compromised fault tolerance may occur is when a drive in an array

fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives

in this array that are configured with RAID 5 fault tolerance will fail.
Compromised fault tolerance can also be caused by non-drive problems, such as a faulty cable or

temporary power loss to a storage system. In such cases, you do not need to replace the physical drives.

However, you may still have lost data, especially if the system was busy at the time that the problem

occurred.

Recovering from compromised fault tolerance

If fault tolerance is compromised, inserting replacement drives does not improve the condition of the

logical volume. Perform the following procedure to recover data:

1.

Check for loose, dirty, broken, or bent cabling and connectors on all devices.

2.

Power down the storage enclosure ("

Power down the server

" on page

11

).

3.

Power up the storage enclosure ("

Power up

" on page

11

).

In some cases, a marginal drive is operational long enough to allow backup of important files.

4.

Make copies of important data, if possible.

5.

Replace any failed drives.

Factors to consider before replacing hard drives

You can replace hard drives without powering down the system. However, before replacing a degraded

drive:

Advertising