Managing the Cluster and Nodes

Managing the cluster involves the following tasks:

Starting the Cluster When All Nodes are Down
Adding Previously Configured Nodes to a Running Cluster
Removing Nodes from Operation in a Running Cluster
Halting the Entire Cluster

In Serviceguard A.11.16 and later, these tasks can be performed by non-root users with the appropriate privileges. See “Controlling Access to the Cluster” for more information about configuring access.

You can use Serviceguard Manager or the Serviceguard command line to start or stop the cluster, or to add or halt nodes. Starting the cluster means running the cluster daemon on one or more of the nodes in a cluster. You use different Serviceguard commands to start the cluster, depending on whether all nodes are currently down (that is, no cluster daemons are running), or whether you are starting the cluster daemon on an individual node.

Note the distinction that is made in this chapter between adding an already configured node to the cluster and adding a new node to the cluster configuration. An already configured node is one that is already entered in the cluster configuration file; a new node is added to the cluster by modifying the cluster configuration file.




	NOTE: Manually starting or halting the cluster or individual nodes does not require access to the quorum server, if one is configured. The quorum server is only used when tie-breaking is needed following a cluster partition.

Starting the Cluster When all Nodes are Down

You can use Serviceguard Manager, or Serviceguard commands as shown below, to start the cluster.

Using Serviceguard Commands to Start the Cluster

Use the cmruncl command to start the cluster when all cluster nodes are down. Particular command options can be used to start the cluster under specific circumstances.

The following command starts all nodes configured in the cluster and verifies the network information:

cmruncl

By default, cmruncl will do network validation, making sure the actual network setup matches the configured network setup. This is the recommended method. If you have recently checked the network and find the check takes a very long time, you can use the -w none option to bypass the validation.

Use the -v (verbose) option to display the greatest number of messages.

The -n option specifies a particular group of nodes. Without this option, all nodes will be started. The following example starts up the locally configured cluster only on ftsys9 and ftsys10. (This form of the command should only be used when you are sure that the cluster is not already running on any node.)

cmruncl -v -n ftsys9 -n ftsys10




	CAUTION: Serviceguard cannot guarantee data integrity if you try to start a cluster with the cmruncl -n command while a subset of the cluster's nodes are already running a cluster. If the network connection is down between nodes, using cmruncl -n might result in a second cluster forming, and this second cluster might start up the same applications that are already running on the other cluster. The result could be two applications overwriting each other's data on the disks.

Adding Previously Configured Nodes to a Running Cluster

You can use Serviceguard Manager, or Serviceguard commands as shown below, to bring a configured node up within a running cluster.

Using Serviceguard Commands to Add Previously Configured Nodes to a Running Cluster

Use the cmrunnode command to join one or more nodes to an already running cluster. Any node you add must already be a part of the cluster configuration. The following example adds node ftsys8 to the cluster that was just started with only nodes ftsys9 and ftsys10. The -v (verbose) option prints out all the messages:

cmrunnode -v ftsys8

By default, cmrunnode will do network validation, making sure the actual network setup matches the configured network setup. This is the recommended method. If you have recently checked the network and find the check takes a very long time, you can use the -w none option to bypass the validation.

Since the node’s cluster is already running, the node joins the cluster. Packages may be started, depending on the package configuration (see node_name on “node_name”). If the node does not find its cluster running, or the node is not part of the cluster configuration, the command fails.

Removing Nodes from Participation in a Running Cluster

You can use Serviceguard Manager, or Serviceguard commands as shown below, to remove nodes from active participation in a cluster. This operation halts the cluster daemon, but it does not modify the cluster configuration. To remove a node from the cluster configuration permanently, you must recreate the cluster configuration file. See the next section.

Halting a node is a convenient way of bringing it down for system maintenance while keeping its packages available on other nodes. After maintenance, the package can be returned to its primary node. See “Moving a Failover Package ”.

To return a node to the cluster, use cmrunnode.




	NOTE: HP recommends that you remove a node from participation in the cluster (by running cmhaltnode as shown below, or Halt Node in Serviceguard Manager) before running the HP-UX shutdown command, especially in cases in which a packaged application might have trouble during shutdown and not halt cleanly.

Use cmhaltnode to halt one or more nodes in a cluster. The cluster daemon on the specified node stops, and the node is removed from active participation in the cluster.

To halt a node with a running package, use the -f option. If a package was running that can be switched to an adoptive node, the switch takes place and the package starts on the adoptive node. For example, the following command causes the Serviceguard daemon running on node ftsys9 in the sample configuration to halt and the package running on ftsys9 to move to an adoptive node The -v (verbose) option prints out messages:

cmhaltnode -f -v ftsys9

This halts any packages running on the node ftsys9 by executing the halt instructions in each package's master control script. ftsys9 is halted and the packages start on their adoptive node.

Halting the Entire Cluster

You can use Serviceguard Manager, or Serviceguard commands as shown below, to halt a running cluster.

Use cmhaltcl to halt the entire cluster. This command causes all nodes in a configured cluster to halt their Serviceguard daemons. You can use the -f option to force the cluster to halt even when packages are running. You can use the command on any running node, for example:

cmhaltcl -f -v

This halts all the cluster nodes.

Automatically Restarting the Cluster

You can configure your cluster to automatically restart after an event, such as a long-term power failure, which brought down all nodes in the cluster. This is done by setting AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file.