Managing the Running Cluster

This section describes some approaches to routine management of the cluster. Additional tools and suggestions are found in Chapter 7, “Cluster and Package Maintenance.”

Checking Cluster Operation with Serviceguard Manager

You can check configuration and status information using Serviceguard Manager: from the System Management Homepage (SMH), choose Tools-> Serviceguard Manager.

Checking Cluster Operation with Serviceguard Commands

Serviceguard also provides several commands for control of the cluster:

cmviewcl checks status of the cluster and many of its components. A non-root user with the role of Monitor can run this command from a cluster node or see status information in Serviceguard Manager.
On systems that support CFS, the Veritas Cluster File System, cfscluster status gives information about the cluster; cfsdgadm display gives information about the cluster’s disk groups.
cmrunnode is used to start Serviceguard on a node. A non-root user with the role of Full Admin can run this command from a cluster node or through Serviceguard Manager.
cmhaltnode is used to manually stop a running node. (This command is also used by shutdown(1m).) A non-root user with the role of Full Admin can run this command from a cluster node or through Serviceguard Manager.
cmruncl is used to manually start a stopped cluster. A non-root user with Full Admin access can run this command from a cluster node, or through Serviceguard Manager.
cmhaltcl is used to manually stop a cluster. A non-root user with Full Admin access, can run this command from a cluster node or through Serviceguard Manager.

You can use these commands to test cluster operation, as in the following:

If the cluster is not already running, start it. From the Serviceguard Manager menu, choose Run Cluster. From the command line, use cmruncl -v.
By default, cmruncl will check the networks. Serviceguard will probe the actual network configuration with the network information in the cluster configuration. If you do not need this validation, use cmruncl -v -w none instead, to turn off validation and save time
When the cluster has started, make sure that cluster components are operating correctly. you can use Serviceguard Manager to do this, or use the command cmviewcl -v.
Make sure that all nodes and networks are functioning as expected. For more information, see Chapter 7 “Cluster and Package Maintenance”.
Verify that nodes leave and enter the cluster as expected using the following steps:
- Halt the node. You can use Serviceguard Manager or use the cmhaltnode command.
- Check the cluster membership to verify that the node has left the cluster. You can do this in Serviceguard Manager, or use the cmviewcl command.
- Start the node. You can use Serviceguard Manager or use the cmrunnode command.
- To verify that the node has returned to operation, check in Serviceguard Manager, or use the cmviewcl command again.
Bring down the cluster. You can do this in Serviceguard Manager, or use the cmhaltcl -v -f command.

Additional cluster testing is described in Chapter 8 “Troubleshooting Your Cluster”. Refer to Appendix A for a complete list of Serviceguard commands. Refer to the Serviceguard Manager Help for a list of Serviceguard Administrative commands.

Preventing Automatic Activation of LVM Volume Groups

It is important to prevent LVM volume groups that are to be used in packages from being activated at system boot time by the /etc/lvmrc file. One way to ensure that this does not happen is to edit the /etc/lvmrc file on all nodes, setting AUTO_VG_ACTIVATE to 0, then including all the volume groups that are not cluster-bound in the custom_vg_activation function. Volume groups that will be used by packages should not be included anywhere in the file, since they will be activated and deactivated by control scripts.




	NOTE: Special considerations apply in the case of the root volume group: If the root volume group is mirrored using MirrorDisk/UX, include it in the `custom_vg_activation` function so that any stale extents in the mirror will be re-synchronized. Otherwise, the root volume group does not need to be included in the `custom_vg_activation` function, because it is automatically activated before the `/etc/lvmrc` file is used at boot time.

Setting up Autostart Features

Automatic startup is the process in which each node individually joins a cluster; Serviceguard provides a startup script to control the startup process. Automatic cluster start is the preferred way to start a cluster. No action is required by the system administrator.

There are three cases:

The cluster is not running on any node, all cluster nodes must be reachable, and all must be attempting to start up. In this case, the node attempts to form a cluster consisting of all configured nodes.
The cluster is already running on at least one node. In this case, the node attempts to join that cluster.
Neither is true: the cluster is not running on any node, and not all the nodes are reachable and trying to start. In this case, the node will attempt to start for the AUTO_START_TIMEOUT period. If neither of these things becomes true in that time, startup will fail.

To enable automatic cluster start, set the flag AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file on each node in the cluster; the nodes will then join the cluster at boot time.

Here is an example of the /etc/rc.config.d/cmcluster file:

#************************  CMCLUSTER  ************************
# Highly Available Cluster configuration
#
# @(#) $Revision: 72.2 $
#
# AUTOSTART_CMCLD:    If set to 1, the node will attempt to
#                      join it's CM cluster automatically when
#                      the system boots.
#                      If set to 0, the node will not attempt
#                      to join it's CM cluster.
#
AUTOSTART_CMCLD=1




	NOTE: The `/sbin/init.d/cmcluster` file may call files that Serviceguard stored in the directories: `/etc/cmcluster/rc` (HP-UX) and `${SGCONF}/rc` (Linux). The directory is for Serviceguard use only! Do not move, delete, modify, or add files to this directory.

Changing the System Message

You may find it useful to modify the system's login message to include a statement such as the following:

This system is a node in a high availability cluster.
Halting this system may cause applications and services to
start up on another node in the cluster.

You might wish to include a list of all cluster nodes in this message, together with additional cluster-specific information.

The /etc/issue and /etc/motd files may be customized to include cluster-related information.

Managing a Single-Node Cluster

The number of nodes you will need for your Serviceguard cluster depends on the processing requirements of the applications you want to protect. You may want to configure a single-node cluster to take advantage of Serviceguard’s network failure protection.

In a single-node cluster, a cluster lock is not required, since there is no other node in the cluster. The output from the cmquerycl command omits the cluster lock information area if there is only one node.

You still need to have redundant networks, but you do not need to specify any heartbeat LANs, since there is no other node to send heartbeats to. In the cluster configuration ASCII file, specify all LANs that you want Serviceguard to monitor. For LANs that already have IP addresses, specify them with the STATIONARY_IP keyword, rather than the HEARTBEAT_IP keyword. For standby LANs, all that is required is the NETWORK_INTERFACE keyword with the LAN device name.

Single-Node Operation

Single-node operation occurs in a single-node cluster or in a multi-node cluster, following a situation where all but one node has failed, or where you have shut down all but one node, which will probably have applications running. As long as the Serviceguard daemon cmcld is active, other nodes can re-join the cluster at a later time.

If the Serviceguard daemon fails when in single-node operation, it will leave the single node up and your applications running. This is different from the loss of the Serviceguard daemon in a multi-node cluster, which halts the node with a system reset, and causes packages to be switched to adoptive nodes.

It is not necessary to halt the single node in this scenario, since the application is still running, and no other node is currently available for package switching.

However, you should not try to restart Serviceguard, since data corruption might occur if the node were to attempt to start up a new instance of the application that is still running on the node. Instead of restarting the cluster, choose an appropriate time to shutdown and reboot the node, which will allow the applications to shut down and then permit Serviceguard to restart the cluster after rebooting.

Disabling identd

Ignore this section unless you have a particular need to disable identd.

You can configure Serviceguard not to use identd.




	CAUTION: This is not recommended. Disabling `identd` removes an important security layer from Serviceguard. See the white paper Securing Serviceguard at `http://docs.hp.com -> High Availability -> Serviceguard -> White Papers` for more information.

If you must disable identd, you can do so by adding the -i option to the tcp hacl-cfg and hacl-probe commands in /etc/inetd.conf.

For example:

Change the cmclconfd entry in /etc/inetd.conf to:
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c -i

Change the cmomd entry in /etc/inetd.conf to:

hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -i \ -f /var/opt/cmom/cmomd.log -r /var/opt/cmom

Restart inetd:
/etc/init.d/inetd restart

Deleting the Cluster Configuration

As root user, you can delete a cluster configuration from all cluster nodes by using Serviceguard Manager or the command line. The cmdeleteconf command prompts for a verification before deleting the files unless you use the -f option. You can delete the configuration only when the cluster is down. The action removes the binary configuration file from all the nodes in the cluster and resets all cluster-aware volume groups to be no longer cluster-aware.




	NOTE: The cmdeleteconf command removes only the cluster binary file `/etc/cmcluster/cmclconfig`. It does not remove any other files from the `/etc/cmcluster` directory.

Although the cluster must be halted, all nodes in the cluster should be powered up and accessible before you use the cmdeleteconf command. If a node is powered down, power it up and boot. If a node is inaccessible, you will see a list of inaccessible nodes together with the following message:

It is recommended that you do not proceed with the configuration operation unless you are sure these nodes are permanently unavailable.Do you want to continue?

Reply Yes to remove the configuration. Later, if the inaccessible node becomes available, you should run the cmdeleteconf command on that node to remove the configuration file.