Partial chipset degradation – NEC 1000 Series User Manual
Page 7
 
7
Memory Mirroring
Continuous operation even in the event of a non-correctable memory error
The Express5800/1000 series server supports high-level memory 
RAS features to ensure that the server can rapidly detect memory 
errors, reduce multi-bit errors and continually operate even in 
the event of memory chip or memory controller failures. Memory 
scan, memory chip sparing (SDDC*) and memory scrubbing are 
examples of those features.
A memory scan is run on all loaded memory modules at each OS
boot. If the system detects a memory failure, the failed component 
is immediately isolated and detached from the system preventing 
possible downtime during business operations.
Chip sparing (SDDC*) memory is a memory system loaded with 
several DRAM chips that can correct errors at the chip level. If 
a failure were to occur in the memory, the error can be corrected 
immediately to allow for continuous operation.
Memory scrubbing checks memory content regularly (every few 
milliseconds) during operation without affecting performance. 
When an error is detected, it is corrected and then reported. 
The scrubbing function is effective in detecting errors in a timely 
manner which ultimately results in the reduction of multi-bit errors.
Memory mirroring takes place continuously, where the same data 
is written onto 2 separate memory blocks instead of 1 (available 
only on the 1160Xf and 1320Xf). In the event of a non-correctable 
error, due to the fact that the data exists on two independent 
blocks, operations are able to continue without interruption. 
Partial Chipset degradation
Avoid multi-partition shutdowns resulting from chipset failures
In certain instances when multiple server partitions share a 
common crossbar controller, effects of a single partition failure 
may result in a multi-partition shutdown. To resolve this issue, the 
Express5800/1000 series servers have been designed to allow for 
the partial degradation of chipsets.
Within each of the LSI chips, which make up the chipset, multiple 
LSI sub-units exist. These sub-units are connected to other sub-
units located on separate LSI chips. The combined sub-units 
together make up single partition. If an error were to occur on an 
LSI sub-unit, that sub-unit alone can be degradated to isolate the 
failure to a single partition, thus preventing the failure to spread to 
other partitions.
Furthermore, the downed partition can automatically reboot 
itself, after isolating the failed subsystem, to resume operations 
in a degradated mode without the intervention of a system 
administrator. This is made possible, on the Express5800/1000 
series servers, by the redundant paths between the Cells and the 
IO.
Memory
Image
Unit of degradation
on the Express5800/
1000 Series
D
at
a
0
D
at
a
2
D
at
a
1
D
at
a
3
D
at
a
0
D
at
a
2
D
at
a
1
D
at
a
3
Cell
Controller
Memory
I/F
Memory
Controller
Memory
I/F
Memory
Controller
Memory
I/F
Memory
Controller
Memory
I/F
Memory
Controller
Components covered by
the memory mirroring
CPU
CPU
CPU
CPU
M
ir
ro
r
M
ir
ro
r
Components covered by
the standard chip sparing
PCIBox
0
0
PCIBox
1
1
0
1
Sub
Unit
Sub
Unit
Crossbar
Controller
A
Sub
Unit
Sub
Unit
Crossbar
Controller
B
Sub
Unit
Sub
Unit
Sub
Unit
Sub
Unit
Sub
Unit
Cell 1
1
Cell 0
0
Partial
degradation
Failure
n specifies the partition number
Sub-units within the chipset
Additional sub-sets exist in 
actuality
Not affected
Failure occurs at the sub-unit of 
the crossbar controller. 
Partition 0 is shutdown so that the 
failed component can be isolated. 
Partition 0 is rebooted
This construct allows for continuous operation through all non-
correctablememory errors, not limited to the memory themselves, 
but also in the memory interfaces and the in memory controllers.
* Single Device Data Correction