Chapter 11. ssa error logs, Error logging, Summary – Compex Systems Advanced SerialRAID Adapters SA33-3285-02 User Manual

Page 241

Advertising
background image

Chapter 11. SSA Error Logs

This chapter describes:

v

Error logging

v

Error logging management

v

Error log analysis

v

Good housekeeping

Each topic is discussed as a summary, then as a detailed description.

The summaries provide all the information that you need for routine service operations
on SSA subsystems. For these operations, you have no need to inspect the system
error log, or to attempt to analyze the contents of the log.

The detailed descriptions help you understand the meaning of the error log data so that
you can further analyze the error log. For example, you might decide to fail-over an
HACMP system when particular critical failures are logged.

Error Logging

Summary

Hardware errors can be detected by an SSA disk drive, an SSA Adapter, or the SSA
device driver. The SSA adapter performs error recovery for disk drive errors; the SSA
device driver performs error recovery for the SSA adapter. When a problem is detected
that needs to be logged, all the relevant data is sent to the error logging service in the
device driver. The error logging service then sends the data to the system error logger.

SSA errors are logged asynchronously; that is, independently of any system I/O activity.
For example, if an SSA cable is unexpectedly disconnected, an Open Serial Link error
is logged immediately. The SSA subsystem does not wait for a read or write command
before it logs the error.

Sometimes, on the SSA network, the SSA adapter and SSA disk drives detect errors
that were possibly caused by activities elsewhere on the network. (Such activities might
be the rebooting of another using system, a system upgrade, or maintenance.) These
errors do not need any service action, and should not cause any problem unless the
automatic error log analysis determines that the error is critical.

Because SSA subsystems are designed for high availability, most subsystem errors do
not cause I/O operations to fail. Some errors, therefore, might not be obvious to the
user. To ensure that the user knows about such errors, a health-check is run to the
adapter each hour. This health-check is started by a cron table entry that instructs the
run_ssa_healthcheck shell script to run once each hour. When an SSA adapter receives
a health-check, it logs any currently-active errors and conditions that it knows exist on
the SSA subsystem.

221

Advertising