Resource failure, Adjusting the poll intervals, Adjusting the threshold and period values – Dell PowerVault 770N (Deskside NAS Appliance) User Manual

Page 79: Configuring failover

Advertising
background image

To ensure cluster unity, the operating system uses the quorum resource to ensure that only one set of active, communicating

nodes is allowed to operate as a cluster. A node can form a cluster only if it can gain control of the quorum resource. A node

can join a cluster or remain in an existing cluster only if it can communicate with the node that controls the quorum resource.

For example, if the private network (cluster interconnect) between cluster nodes 1 and 2 fails, each node assumes that the

other node has failed, causing both nodes to continue operating as the cluster. If both nodes were allowed to operate as the

cluster, the result would be two separate clusters using the same cluster name and competing for the same resources. To

solve this problem, MSCS uses the node that owns the quorum resource to maintain cluster unity and solve this problem. In

this scenario, the node that gains control of the quorum resource is allowed to form a cluster, and the other fails over its

resources and becomes inactive.

Resource Failure

A failed resource is not operational on the current host node. At periodic intervals, the Cluster Service checks to see if the

resource appears operational by periodically invoking the Resource Monitor. The Resource Monitor uses the resource DLL for

each resource to detect if the resource is functioning properly. The resource DLL communicates the results back through the

Resource Monitor to the Cluster Service.

Adjusting the Poll Intervals

You can specify how frequently the Cluster Service checks for failed resources by setting the Looks Alive (general resource

check) and Is Alive (detailed resource check) poll intervals. The Cluster Service requests a more thorough check of the

resource's state at each Is Alive interval than it does at each Looks Alive interval; therefore, the Is Alive poll interval is

typically longer than the Looks Alive poll interval.

NOTE:

Do not adjust the Looks Alive and Is Alive settings unless instructed by technical support.

Adjusting the Threshold and Period Values

If the resource DLL reports that the resource is not operational, the Cluster Service attempts to restart the resource. You can

specify the number of times the Cluster Service can attempt to restart a resource in a given time interval. If the Cluster

Service exceeds the maximum number of restart attempts (Threshold value) within the specified time period (Period value),

and the resource is still not operational, the Cluster Service considers the resource to be failed.

NOTE:

See "

Setting Advanced Resource Properties

" to configure the Looks alive, Is alive, Threshold, and Period

values for a particular resource.

NOTE:

Do not adjust the Threshold and Period values settings unless instructed by technical support.

Configuring Failover

You can configure a resource to failover an entire group to another node when a resource in that group fails for any reason. If

the failed resource is configured to cause the group that contains the resource to failover to another node, Cluster Service will

attempt a failover. If the number of failover attempts exceeds the group's threshold and the resource is still in a failed state,

the Cluster Service will attempt to restart the resource. The restart attempt will be made after a period of time specified by

the resource's Retry Period On Failure property, a property common to all resources.

When you configure the Retry Period On Failure properly, consider the following guidelines:

Select a unit value of minutes, rather than milliseconds (the default value is milliseconds).

Select a value that is greater or equal to the value of the resource's restart period property. This rule is enforced by the

Cluster Service.

Advertising