|Managing Serviceguard Fifteenth Edition > Chapter 3 Understanding
Serviceguard Software Components
Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster.
There are 3 types of packages:
System multi-node packages are supported only for use by applications supplied by Hewlett-Packard.
A failover package can be configured to have a dependency on a multi-node or system multi-node package. The package manager cannot start a package on a node unless the package it depends on is already up and running on that node.
The package manager will always try to keep a failover package running unless there is something preventing it from running on any node. The most common reasons for a failover package not being able to run are that auto_run is disabled so Serviceguard is not allowed to start the package, that node switching is disabled for the package on particular nodes, or that the package has a dependency that is not being met. When a package has failed on one node and is enabled to switch to another node, it will start up automatically in a new location where its dependencies are met. This process is known as package switching, or remote switching.
A failover package starts on the first available node in its configuration file; by default, it fails over to the next available one in the list. Note that you do not necessarily have to use a cmrunpkg command to restart a failed failover package; in many cases, the best way is to enable package and/or node switching with the cmmodpkg command.
When you create the package, you indicate the list of nodes on which it is allowed to run. System multi-node packages must list all cluster nodes in their cluster. Multi-node packages and failover packages can name some subset of the cluster’s nodes or all of them.
If the auto_run parameter is set to yes in a package’s configuration file Serviceguard automatically starts the package when the cluster starts. System multi-node packages are required to have auto_run set to yes. If a failover package has auto_run set to no, Serviceguard cannot start it automatically at cluster startup time; you must explicitly enable this kind of package using the cmmodpkg command.
How does a failover package start up, and what is its behavior while it is running? Some of the many phases of package life are shown in Figure 3-13 “Legacy Package Time Line Showing Important Events”.
The following are the most important moments in a package’s life:
First, a node is selected. This node must be in the package’s node list, it must conform to the package’s failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node. Another type of resource is a dependency on a monitored external resource or on a special-purpose package. If monitoring shows a value for a configured resource that is outside the permitted range, the package cannot start.
Once a node is selected, a check is then done to make sure the node allows the package to start on it. Then services are started up for a package by the control script on the selected node. Strictly speaking, the run script on the selected node is used to start a legacy package; the master control script starts a modular package.
Once the package manager has determined that the package can start on a particular node, it launches the script that starts the package (that is, a package’s control script or master control script is executed with the start parameter). This script carries out the following steps:
(Legacy Package)At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error.
If the run script execution is not complete before the time specified in the run_script_timeout, the package manager will kill the script. During run script execution, messages are written to a log file. For legacy packages, this is in the same directory as the run script and has the same name as the run script and the extension .log. For modular packages, the pathname is determined by the script_log_file parameter in the package configuration file (see “script_log_file”). Normal starts are recorded in the log, together with error messages or warnings related to starting the package.
Exit codes on leaving the run script determine what happens to the package next. A normal exit means the package startup was successful, but all other exits mean that the start operation did not complete successfully.
Within the package control script, the cmrunserv command starts up the individual services. This command is executed once for each service that is coded in the file. You can configure a number of restarts for each service. The cmrunserv command passes this number to the package manager, which will restart the service the appropriate number of times if the service should fail. The following are some typical settings in a legacy package; for more information about configuring services in modular packages, see the discussion starting on “service_name”, and the comments in the package configuration template file.
During the normal operation of cluster services, the package manager continuously monitors the following:
Some failures can result in a local switch. For example, if there is a failure on a specific LAN card and there is a standby LAN configured for that subnet, then the Network Manager will switch to the healthy LAN card. If a service fails but the restart parameter for that service is set to a value greater than 0, the service will restart, up to the configured number of restarts, without halting the package.
If there is a configured EMS resource dependency and there is a trigger that causes an event, the package will be halted.
During normal operation, while all services are running, you can see the status of the services in the “Script Parameters” section of the output of the cmviewcl command.
What happens when something goes wrong? If a service fails and there are no more restarts, if a subnet fails and there are no standbys, if a configured resource fails, or if a configured dependency on a special-purpose package is not met, then a failover package will halt on its current node and, depending on the setting of the package switching flags, may be restarted on another node. If a multi-node or system multi-node package fails, all of the packages that have configured a dependency on it will also fail.
Package halting normally means that the package halt script executes (see the next section). However, if a failover package’s configuration has the service_fail_fast_enabled flag set to yes for the service that fails, then the node will halt as soon as the failure is detected. If this flag is not set, the loss of a service will result in halting the package gracefully by running the halt script.
If auto_run is set to yes, the package will start up on another eligible node, if it meets all the requirements for startup. If auto_run is set to no, then the package simply halts without starting up anywhere else.
The Serviceguard cmhaltpkg command has the effect of executing the package halt script, which halts the services that are running for a specific package. This provides a graceful shutdown of the package that is followed by disabling automatic package startup (see auto_run on “auto_run”).
You cannot halt a multi-node or system multi-node package unless all packages that have a configured dependency on it are down. Use cmviewcl to check the status of dependents. For example, if pkg1 and pkg2 depend on PKGa, both pkg1 and pkg2 must be halted before you can halt PKGa.
The cmmodpkg command cannot be used to halt a package, but it can disable switching either on particular nodes or on all nodes. A package can continue running when its switching has been disabled, but it will not be able to start on other nodes if it stops running on its current node.
Once the package manager has detected the failure of a service or package that a failover package depends on, or when the cmhaltpkg command has been issued for a particular package, the package manager launches the halt script. That is, a package’s control script or master control script is executed with the stop parameter. This script carries out the following steps (also shown in Figure 3-15 “Legacy Package Time Line for Halt Script Execution”):
At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file. For legacy packages, this is in the same directory as the run script and has the same name as the run script and the extension .log. For modular packages, the pathname is determined by the script_log_file parameter in the package configuration file (see “script_log_file”). Normal starts are recorded in the log, together with error messages or warnings related to halting the package.
The package’s ability to move to other nodes is affected by the exit conditions on leaving the halt script. The following are the possible exit codes:
Table 3-3 “Error Conditions and Package Movement for Failover Packages” shows the possible combinations of error condition, failfast setting and package movement for failover packages.
Table 3-3 Error Conditions and Package Movement for Failover Packages