Cpu protection, Ecc on caches, Automatic cpu deconfiguration – HP RX8620-32 User Manual

Page 31: Cpu cooling

Advertising
background image

31

CPU protection
The central processing unit is often a major cause of system downtime. For instance, CPU cache errors
are demonstrated to be a large contributor (in many cases, the greatest contributor) to unplanned
system downtime. Furthermore, addition or modification of CPU resources is among the highest-
ranking causes of planned hardware downtime. But in the Integrity rx7620-16 and rx8620-32
Servers, HP has designed specific features to combat CPU-caused downtime, including the following:

• Full error checking and correcting (ECC) on all caches
• Automatic deconfiguration of “faulty” CPUs (known as dynamic processor resilience [DPR])
• A highly effective and reliable CPU cooling scheme
• CPU hot-spares using HP Instant Capacity
• Redundant CPU power converters

ECC on caches
The CPU caches in the Integrity rx7620-16 and rx8620-32 Servers are fully protected from single-bit
hard errors and random soft errors generated from cosmic rays or other intermittent error-generation
sources. Some competitive systems in the same class are not similarly protected, resulting in errors that
are hard to debug and that are, in many cases, blamed on the customer environment. Such cache
errors in these unprotected systems can result in failures that bring down multiple partitions.

Another advantage of the Integrity rx7620-16 and rx8620-32 Server CPU cache is its layout, which
significantly reduces the chance of a multi-bit error due to a random cosmic ray strike. Such attention
to detail is not found in many designs available from other vendors.

Automatic CPU deconfiguration
Dynamic processor resilience (DPR) refers to the ability of the system to detect, de-allocate, and swap
in spare CPUs online for CPUs that are generating an excessive quantity of recoverable cache errors.
This protects the customer against the extremely unlikely event of a double-bit cache error. This is one
example of the self-healing features of the HP hardware. Implementation of this feature results in no
downtime or performance loss. This feature is not currently supported with Windows or Linux.

CPU cooling
Heat is the big enemy of electronic components. But the Integrity rx7620-16 and rx8620-32 Servers’
two-level cooling scheme offers outstanding cooling capacity at a nominal cost. The servers’ turbo-
cooler fans draw air directly into the heat sinks of the CPU and cell VLSI. At the extreme operating
ranges of the Integrity rx7620-16 and rx8620-32 Servers, the turbo-cooler fans keep temperatures
well below the maximum values allowed. Even though the turbo-coolers may not be required under
normal operating conditions, running them assures that the silicon chips operate at the lowest
temperature, helping to ensure maximum lifetime.

To further improve reliability of the Integrity rx7620-16 and rx8620-32 Servers, manageability
software monitors the speeds of all fans, including turbo-cooler fans. The Integrity rx7620-16 and
rx8620-32 Servers’ smart fan controller can detect the first hint of slowdown associated with bearing
wear, making sure you get plenty of warning before a fan fails.

Advertising
This manual is related to the following products: