Thresholding, Sel event log format for machine check errors, 3 thresholding – Dell PowerEdge 7250 User Manual

Page 15: 4 sel event log format for machine check errors

Advertising
background image

SR870BN4 Error Reference Guide

SR870BN4 Machine Check Error Handling

Revision 1.0

9

PCI Component

Critical Interrupt
PERR
SERR

PCI Bus, Device, Function info

Memory Device

Memory Error
Correctable
Uncorrectable

SMBIOS Type 16 0-based index
SMBIOS Type 17 0-based index

Other Critical

Interrupt

Bus Correctable error
Bus Uncorrectable error

4.3 Thresholding

MCA classifies errors into one of three categories: corrected, recoverable, and fatal. In general,
corrected errors will not affect the operation of the sytem and therefore may occur repeatedly
(fatal and most recoverable errors result in a system reset.) In some cases, such as a stuck bit
in a memory DIMM, a corrected error may occur with a very high frequency. In this scenario,
the system may experience a performance degredation due to excessive amounts of time spent
in the error logging routines. In addition, the BMC SEL has a finite size and may be quickly
filled with duplicate errors. To help alleviate this problems, a thresholding agorithm has been
applied to the BMC SEL logging routines. If the threshold is crossed, a special “event disabled”
SEL entry will be created and the the BMC SEL logging code will not attempt to send future
platform event message commands for that error type to the BMC. This greatly reduces the
amount of time spent in the SEL logging routines and avoids overrunning the BMC SEL log
storage. This thresholding in no way affects the ability of the OS to receive notification and
service CPEIs or CMCIs, nor does it disable any error correction logic in the chipset. Any
disabled event reporting will be re-enabled on the next reboot.

Corrected errors are grouped into four categories: Processor, Memory, PCI PERR, and Generic
Bus. History for each category is maintained separately. Recoverable and fatal errors are not
thresholded, only corrected errors. On the SR870BN4, the maximum number of errors that can
occur for each category is “10” within one hour. If this threshold is crossed, a special ‘Event
Logging Disabled’
SEL entry will be logged.

4.4 SEL Event Log Format for Machine Check Errors

The following table shows the machine check errors that will be logged for the SR870BN4, and
the corresponding SEL Event Log format. For details on System Management BIOS (SMBIOS)
Type 4, Type 16 and 17, refer to the System Management BIOS Reference Specificaton
available on www.dmtf.org.

Advertising