Detecting double-bit errors, Fixing double-bit errors – Juniper Networks E-Series User Manual

Page 133

Advertising
background image

Double-Bit Errors on SRP Modules

E-Series Routers

10-11

If ECC detects a single-bit error, it automatically corrects the error,
and operation continues.

If ECC detects a double-bit error, it logs the error, stops the main
processor on the controller, and takes the SRP module offline.

Detecting Double-Bit Errors

The following message appears on the console if ECC detects a
double-bit error:

ALERT 05/10/2000 13:10:33 os: failed: ECC DOUBLE BIT ERROR

OCCURRED

Address = 0xe95db10

Data (Upper 32Bits) = 0xe95db20

Data (Lower 32Bits) = 0x55d06c

ECC Data Bits = 0x2b

ECC 1Bit Error Counter = 0x0

*** YOU MUST PERFORM A HARD RESET TO CONTINUE ***

ALERT 05/10/2000 13:10:34 os: PROCESSOR EXCEPTION: 0x200n

If ECC detects a double-bit error in a system that contains a redundant
SRP module, the redundant module becomes active and the system
continues to operate. However, you must still troubleshoot the SRP
module with the double-bit error. If ECC detects a double-bit error in a
system that does not contain a redundant SRP module, you must
troubleshoot the SRP module immediately. See

Fixing Double-Bit Errors

.

Fixing Double-Bit Errors

To fix a double-bit error:

1

Remove the second SRP module, if there is one.

2

Reboot the system with the board reset button on the primary SRP
module (see

Figure 1-12

).

These actions attempt to correct a transient double-bit error. However, if
the console displays a memory test failure for the SRP module after you
reboot, or if the FAIL LED on the SRP module stays on during
rebooting, the SDRAM is permanently damaged and needs replacing. In
this event, call Juniper Networks Customer Service to arrange for repair.

Advertising