Intel SE8500HW4 User Manual

Page 89

Advertising
background image

Intel® Server Board Set SE8500HW4

System BIOS

Revision 1.0

Intel order number D22893-001

77

DIMM bank’s uncorrectable error count. If the error count is less than 10 per hour, the BIOS
reports the uncorrectable ECC error to the SEL. When the DIMM uncorrectable error count
reaches 10, BIOS lights the bad DIMMs LEDs and disables the DIMM bank for subsequent
boots. The system continues to function from redundant memory.

Multiple consecutive uncorrectable ECC errors may cause a XMB fail condition and the entire
Memory Board to be disabled. When the XMB fail occurs, the BIOS is no longer able to access
the XMB registers in order to locate the failing DIMM(s). Hence, the BIOS does NOT light the
bad DIMM LED, log the failed DIMM information or disable the failed DIMMs.

If XMB failed due to uncorrectable ECC errors while system is operating in a redundant state,
the system continues operation in a non-redundant state. The BIOS logs a SEL event to
indicate that an uncorrectable ECC error has occurred on the failed Memory Board. The BIOS
also sends commands to the BMC update the DIMM state as “Not Present”. The user may
perform a memory hot replace operation to replace the bad Memory Board with a good Memory
Board to restore the system to redundant mode.

If multiple uncorrectable ECC errors occur while the system is operating in non-redundant
mode, the system will hang.

When a correctable ECC error occurs during runtime, the DIMM correctable error count is
incremented. If the error count is less than the error stop report threshold, the BIOS reports the
correctable ECC error to the SEL. If the board containing the DIMM with the correctable error
has available spares, the error stop report threshold shall be the same as the error threshold for
switching to spare. If the board has no available spare, the error stop report threshold shall be
10 errors per hour. When the error count reaches the error stop report threshold, the BIOS
reports to the SEL that the correctable error stop report threshold has been reached and stops
report of subsequent correctable ECC errors for the DIMM. If a spare Rank is available on the
Memory Board with the error when error threshold for switching to spare is reached, the system
copies the contents of the bad Rank to the spare Rank, switches to the spare Rank, sets the
Memory Board LED to indicate the bad DIMM(s) and disables the bad DIMM bank and sparing
for subsequent boots. With sparing disabled, the ranks previously reserved for spares are used
for system memory.

Any disabled event reporting will be re-enabled on the next reboot.

10.3.3 I/O

Devices

Advertising