The following sections offer a few suggestions for troubleshooting
by reviewing the state of the running system and by examining cluster status
data, log files, and configuration files. Topics include:
Reviewing Package IP Addresses
Reviewing the System Log File
Reviewing Configuration Files
Reviewing the Package Control Script
Using cmquerycl and cmcheckconf
Using cmscancl and cmviewcl
Reviewing the LAN Configuration
|
| |
|
| NOTE: HP recommends you use Serviceguard Manager as a convenient
way to observe the status of a cluster and the properties of cluster
objects: from the System Management Homepage (SMH), select the cluster
you need to troubleshoot. |
|
| |
|
Reviewing
Package IP Addresses |
|
The netstat -in command can be used to examine the LAN configuration.
The command, if executed on ftsys9 after ftsys10 has been halted, shows that the package IP addresses
are assigned to lan0 on ftsys9 along with the primary LANIP address.
ftsys9>netstat -inIPv4: Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll ni0# 0 none none 0 0 0 0 0 ni1* 0 none none 0 0 0 0 0 lo0 4608 127 127.0.0.1 10114 0 10 0 0 lan0 1500 15.13.168 15.13.171.14 959269 0 33069 0 0 lan0:1 1500 15.13.168 15.13.171.23 959269 0 33069 0 0 lan0:2 1500 15.13.168 15.13.171.20 959269 0 33069 0 0 lan1* 1500 none none 418623 0 55822 0 0 IPv6: Name Mtu Address/Prefix Ipkts Opkts lan1* 1500 none 0 0 lo0 4136 ::1/128 10690 10690
|
Reviewing
the System Log File |
|
Messages from the Cluster Manager and Package Manager are
written to the system log file. The default location of the log
file is /var/adm/syslog/syslog.log. Also, package-related messages are logged into
the package log file. The package log file is located in the package
directory, by default. You can use a text editor, such as vi, or the more command to view the log file for historical information
on your cluster.
It is always a good idea to review the syslog.log file on each of the nodes in the
cluster when troubleshooting cluster problems.
This log provides information on the following:
Commands executed and their outcome.
Major cluster events which may, or may not, be errors.
Cluster status information.
|
| |
|
| NOTE: Many other products running on HP-UX in addition to
Serviceguard use the syslog.log file to save messages. The HP-UX System
Administrator’s Guide provides additional information
on using the system log. |
|
| |
|
Sample
System Log Entries
The following entries from the file /var/adm/syslog/syslog.log show a package that failed to run due to a problem
in the pkg5_run script. You would look at the pkg5_run.log for details.
|
Dec 14 14:33:48 star04 cmcld[2048]: Starting cluster management protocols. Dec 14 14:33:48 star04 cmcld[2048]: Attempting to form a new cluster Dec 14 14:33:53 star04 cmcld[2048]: 3 nodes have formed a new cluster Dec 14 14:33:53 star04 cmcld[2048]: The new active cluster membership is: star04(id=1) , star05(id=2), star06(id=3) Dec 14 17:33:53 star04 cmlvmd[2049]: Clvmd initialized successfully. Dec 14 14:34:44 star04 CM-CMD[2054]: cmrunpkg -v pkg5 Dec 14 14:34:44 star04 cmcld[2048]: Request from node star04 to start package pkg5 on node star04. Dec 14 14:34:44 star04 cmcld[2048]: Executing '/etc/cmcluster/pkg5/pkg5_run start' for package pkg5. Dec 14 14:34:45 star04 LVM[2066]: vgchange -a n /dev/vg02 Dec 14 14:34:45 star04 cmcld[2048]: Package pkg5 run script exited with NO_RESTART. Dec 14 14:34:45 star04 cmcld[2048]: Examine the file /etc/cmcluster/pkg5/pkg5_run.log for more details.
|
The following is an example of a successful package starting:
Dec 14 14:39:27 star04 CM-CMD[2096]: cmruncl Dec 14 14:39:27 star04 cmcld[2098]: Starting cluster management protocols. Dec 14 14:39:27 star04 cmcld[2098]: Attempting to form a new cluster Dec 14 14:39:27 star04 cmclconfd[2097]: Command execution message Dec 14 14:39:33 star04 cmcld[2098]: 3 nodes have formed a new cluster Dec 14 14:39:33 star04 cmcld[2098]: The new active cluster membership is: star04(id=1), star05(id=2), star06(id=3) Dec 14 17:39:33 star04 cmlvmd[2099]: Clvmd initialized successfully. Dec 14 14:39:34 star04 cmcld[2098]: Executing '/etc/cmcluster/pkg4/pkg4_run start' for package pkg4. Dec 14 14:39:34 star04 LVM[2107]: vgchange /dev/vg01 Dec 14 14:39:35 star04 CM-pkg4[2124]: cmmodnet -a -i 15.13.168.0 15.13.168.4 Dec 14 14:39:36 star04 CM-pkg4[2127]: cmrunserv Service4 /vg01/MyPing 127.0.0.1 >>/dev/null Dec 14 14:39:36 star04 cmcld[2098]: Started package pkg4 on node star04.
|
Reviewing
Object Manager Log Files |
|
The Serviceguard Object Manager daemon cmomd logs messages to the file /var/opt/cmom/cmomd.log. You can review these messages using the cmreadlog command, as follows:
cmreadlog /var/opt/cmom/cmomd.log
Messages from cmomd include information about the processes that request
data from the Object Manager, including type of data, timestamp,
etc.
Reviewing
Serviceguard Manager Log Files |
|
From the System Management Homepage (SMH), click Tools, then select Serviceguard Manager, select the cluster you are interested and then choose View -> Operation Log.
Reviewing
the System Multi-node Package Files |
|
If you are running Veritas Cluster Volume Manager and you
have problems starting the cluster, check the log file for the system
multi-node package. For Cluster Volume Manager (CVM) 3.5, the file
is VxVM-CVM-pkg.log. For CVM 4.1 and later, the file is SG-CFS-pkg.log.
Reviewing
Configuration Files |
|
Review the following ASCII configuration files:
Cluster configuration file.
Package configuration files.
Ensure that the files are complete and correct according to
your configuration planning worksheets.
Reviewing
the Package Control Script |
|
Ensure that the package control script is found on all nodes
where the package can run and that the file is identical on all
nodes. Ensure that the script is executable on all nodes. Ensure
that the name of the control script appears in the package configuration
file, and ensure that all services named in the package configuration
file also appear in the package control script.
Information about the starting and halting of each package
is found in the package’s control script log. This log
provides the history of the operation of the package control script.
By default, it is found at /etc/cmcluster/<package_name>/control_script.log; but another location may have been specified
in the package configuration file’s script_log_file parameter. This log documents all package run and halt
activities. If you have written a separate run and halt script for
a legacy package, each script will have its own log.
Using
the cmcheckconf Command |
|
In addition, cmcheckconf can be used to troubleshoot your cluster just
as it was used to verify the configuration.
The following example shows the commands used to verify the
existing cluster configuration on ftsys9 and ftsys10:
cmquerycl -v -C /etc/cmcluster/verify.ascii -n ftsys9 -n ftsys10 cmcheckconf -v -C /etc/cmcluster/verify.ascii
|
The cmcheckconf command checks:
The network addresses and connections.
The cluster lock disk connectivity.
The validity of configuration parameters of the
cluster and packages for:
The existence and permission of scripts.
It doesn’t check:
The correct setup of the power circuits.
The correctness of the package configuration script.
Using
the cmscancl Command |
|
The command cmscancl displays information about all the nodes in a cluster
in a structured report that allows you to compare such items as IP
addresses or subnets, physical volume names for disks, and other node-specific
items for all nodes in the cluster. cmscancl actually runs several different HP-UX commands
on all nodes and gathers the output into a report on the node where
you run the command.
To run the cmscancl command, the root user on the cluster nodes must have
the .rhosts file configured to allow the command to complete successfully. Without
that, the command can only collect information on the local node,
rather than all cluster nodes.
The following are the types of configuration data that cmscancl displays for each node:
Table 8-1 Data Displayed by the cmscancl Command
Description | Source of Data |
---|
LAN device configuration and status | lanscan command |
network status and interfaces | netstat command |
file systems | mount command |
LVM configuration | /etc/lvmtab file |
LVM physical volume group data | /etc/lvmpvg file |
link level connectivity for all links | linkloop command |
binary configuration file | cmviewconf command |
Using
the cmviewconf Command |
|
cmviewconf allows you to examine the binary cluster configuration
file, even when the cluster is not running. The command displays
the content of this file on the node where you run the command.
Reviewing
the LAN Configuration |
|
The following networking commands can be used to diagnose
problems:
netstat -in can be used to examine the LAN configuration. This command
lists all IP addresses assigned to each LAN interface card.
lanscan can also be used to examine the LAN configuration. This command
lists the MAC addresses and status of all LAN interface cards on
the node.
arp -a can be used to check the arp tables.
landiag is useful to display, diagnose, and reset LAN card information.
linkloop verifies the communication between LAN cards at MAC address
levels. For example, if you enter
linkloop -i4 0x08000993AB72
|
you should see displayed the following message:
Link Connectivity to LAN station: 0x08000993AB72 OK
|
cmscancl can be used to verify that primary and standby LANs are on
the same bridged net.
cmviewcl -v shows the status of primary and standby LANs.
Use these commands on all nodes.