Reliability and serviceability features, Chapter 6, Reliability features – Hitachi 1000 User Manual

Page 39

Advertising
background image

www.hitachi.com

BladeSymphony 1000 Architecture

White Paper

39

Chapter 6

Reliability and Serviceability Features

Reliability, availability, and serviceability are key requirements for platforms running business-critical
application services. In today’s globally competitive environment, where users access applications
round-the-clock, downtime is unacceptable and can result in lost customers, revenue, and reputation.
The BladeSymphony 1000 is designed with a number of features intended to increase the uptime of the
system.

Reliability Features

Intended to execute core business operations, the BladeSymphony 1000’s modular design increases
reliability through the high availability of redundant components. Rather than focus on creating individual
highly available components, the BladeSymphony 1000 utilizes multiple industry-standard components
to cost-effectively increase reliability. Redundant components also increase the serviceability of the
system by allowing the system to continue operating while new components are added or failed
components are replaced.

The BladeSymphony 1000 is designed with features to help ensure the system does not crash due to a
failure and to minimize the effects from a failure. These features are listed in Table 11.

Table 11: Reliability features

Function

Feature

Quickly detect/diagnose failed part

BIOS self-diagnostic function
Memory scrubbing function (Intel Itanium Server Blade)

Failure recovery by retry and correc-
tion

ECC function (memory, CPU bus, SMP link (Intel Itanium
Server Blade), CRC retry function (PCIe, SCSI)

Dynamic isolation of failed part

Advanced ECC, online spare memory

Redundant configurations

HDD Modules, redundant Switch & Management Modules,
Power Modules, and Cooling Fan Modules
Memory mirroring (Intel Xeon Server Blades)

Redundant system configurations

Redundant LAN/FC modules
Cluster system configuration, N+1/N+M configurations

Obtain failure information

Isolation of failed part using System Event Log, BladeSym-
phony Management Suite, and Storage Manager
Automatic notification of failure by ASSIST via email

Block failed part

Isolation of failed part upon system boot

Repair failed part during operation

Repair CPI adapter, Switch & Management Module, Power
Module, Cooling Fan Module while system is operating

Advertising