High availability overview, Availability requirements, Availability evaluation – H3C Technologies H3C S10500 Series Switches User Manual

Page 10: Mtbf, Mttr

Advertising
background image

1

High availability overview

Communication interruptions can seriously affect widely-deployed value-added services such as IPTV

and video conference. Therefore, the basic network infrastructures must be able to provide high

availability.
The following are the effective ways to improve availability:

Increasing fault tolerance

Speeding up fault recovery

Reducing impact of faults on services

Availability requirements

Availability requirements fall into three levels based on purpose and implementation, as shown in

Table

1

.

Table 1 Availability requirements

Level Requirement

Solution

1

Decrease system software and
hardware faults

Hardware: Simplifying circuit design, enhancing

production techniques, and performing reliability tests.

Software: Reliability design and test

2

Protect system functions from being
affected if faults occur

Device and link redundancy and deployment of switchover
strategies

3

Enable the system to recover as fast
as possible

Performing fault detection, diagnosis, isolation, and
recovery technologies

The level 1 availability requirement should be considered during the design and production process of

network devices. Level 2 should be considered during network design. Level 3 should be considered
during network deployment, according to the network infrastructure and service characteristics.

Availability evaluation

Mean Time Between Failures (MTBF) and Mean Time to Repair (MTTR) are used to evaluate the

availability of a network.

MTBF

MTBF is the predicted elapsed time between inherent failures of a system during operation. It is typically

in the unit of hours. A higher MTBF means a high availability.

MTTR

MTTR is the average time required to repair a failed system. MTTR in a broad sense also involves spare

parts management and customer services.
MTTR = fault detection time + hardware replacement time + system initialization time + link recovery time

+ routing time + forwarding recovery time. A smaller value of each item means a smaller MTTR and a

higher availability.

Advertising