How Serviceguard Uses Arbitration

Serviceguard employs a lock disk, a quorum server, or arbitrator nodes to provide definitive arbitration to prevent split-brain conditions. This section describes how the software handles cluster formation and re-formation and supplies arbitration when necessary.

Cluster Startup

The cluster manager is used to initialize a cluster, to monitor the health of the cluster, to recognize node failure if it should occur, and to regulate the re-formation of the cluster when a node joins or leaves the cluster. The cluster manager operates as a daemon process that runs on each node. During cluster startup and re-formation activities, one node is selected to act as the cluster coordinator. Although all nodes perform some cluster management functions, the cluster coordinator is the central point for inter-node communication.

Startup and Re-Formation

The cluster can start when the cmruncl command is issued from the command line. All nodes in the cluster must be present for cluster startup to complete. If all nodes are not present, then the cluster must be started by issuing commands that specify only a specific group of nodes. This is to ensure that we do not create a split-brain situation.

Cluster re-formation occurs any time a node joins or leaves a running cluster. This can follow the reboot of an individual node, or it may be when all nodes in a cluster have failed, as when there has been an extended power failure and all SPUs went down.

Automatic cluster startup will take place if the flag AUTOSTART_CMCLD is set to 1 in the /etc/rc.config.d/cmcluster file. When any node reboots with this parameter set to 1, it will rejoin an existing cluster, or if none exists it will attempt to form a new cluster. As with the cmruncl command, automatic initial startup requires all nodes in the cluster to be present. If all nodes are not present, the cluster must be started with commands.

Dynamic Cluster Re-Formation

A dynamic re-formation is a temporary change in cluster membership that takes place as nodes join or leave a running cluster. Re-formation differs from reconfiguration, which is a permanent modification of the configuration files. Re-formation of the cluster occurs under the following conditions (not a complete list):

An SPU or network failure was detected on an active node.
An inactive node wants to join the cluster. The cluster manager daemon has been started on that node.
The system administrator halted a node.
A node halts because of a package failure.
A node halts because of a service failure.
Heavy network traffic prohibited the heartbeat signal from being received by the cluster.
The heartbeat network failed, and another network is not configured to carry heartbeat.

Typically, re-formation results in a cluster with a different composition. The new cluster may contain fewer or more nodes than in the previous incarnation of the cluster.

Cluster Quorum and Cluster Locking

Recall that the algorithm for cluster re-formation requires a cluster quorum of a strict majority (that is, more than 50%) of the nodes previously running. If both halves (exactly 50%) of a previously running cluster were allowed to re-form, there would be a split-brain situation in which two instances of the same cluster were running.

Cluster Lock

Although a cluster quorum of more than 50% is generally required, Serviceguard allows exactly 50% of the previously running nodes to re-form as a new cluster provided that the other 50% of the previously running nodes do not also re-form. This is guaranteed by the use of an arbiter or tie-breaker to choose between the two equal-sized node groups, allowing one group to form the cluster and forcing the other group to shut down. This type of arbitration is known as a cluster lock.

The cluster lock is used as a tie-breaker only for situations in which a running cluster fails and, as Serviceguard attempts to form a new cluster, the cluster is split into two sub-clusters of equal size. Each sub-cluster will attempt to acquire the cluster lock. The sub-cluster which gets the cluster lock will form the new cluster, preventing the possibility of two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster lock is not used.

If you have a two-node cluster, you are required to configure the cluster lock. If communications are lost between these two nodes, the node that obtains the cluster lock will take over the cluster and the other node will undergo a forced halt. Without a cluster lock, a failure of either node in the cluster will result in a forced immediate system halt of the other node, and therefore the cluster will halt.

If the cluster lock fails or is unavailable during an attempt to acquire it, the cluster will halt. You can avoid this problem by configuring the cluster’s hardware so that the cluster lock is not lost due to an event that causes a failure in another cluster component.

No Cluster Lock

Normally, you should not configure a cluster of three or fewer nodes without a cluster lock. In two-node clusters, a cluster lock is required. You may consider using no cluster lock with configurations of three or more nodes, although the decision should be affected by the fact that any cluster may require tie-breaking. For example, if one node in a three-node cluster is removed for maintenance, the cluster reforms as a two-node cluster. If a tie-breaking scenario later occurs due to a node or communication failure, the entire cluster will become unavailable.

In a cluster with four or more nodes, you may not need a cluster lock since the chance of the cluster being split into two halves of equal size is very small. However, be sure to configure your cluster to prevent the failure of exactly half the nodes at one time. For example, make sure there is no potential single point of failure such as a single LAN between equal numbers of nodes, and that you use multiple power circuits with less than half of the nodes on any single power circuit.

Cluster lock disks are not allowed in clusters of more than four nodes. A quorum server or arbitrator nodes may be employed with larger clusters, and this kind of arbitration is necessary for extended distance clusters and with MetroCluster configurations to provide disaster tolerance.

Lock Requirements

The cluster lock can be implemented either by means of a lock disk (HP-UX clusters only), a lock LUN (HP-UX and Linux clusters), or by means of a quorum server (HP-UX and Linux clusters). A one-node cluster does not require a cluster lock. A two-node cluster requires a cluster lock. In larger clusters, the cluster lock is strongly recommended. If you have a cluster with more than four nodes, a cluster lock disk or lock LUNis not allowed, but you can use a quorum server. Therefore, if the cluster is expected to grow to more than four nodes and you want to use a arbitration mechanism, you should use a quorum server. In clusters that span several data centers, a more practical alternative may be the use of arbitrator nodes. Arbitrator nodes are not a form of cluster lock, but rather they are components that prevent the cluster from ever being partitioned into two equal-sized groups of nodes.