Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX System Administrator's Guide: Overview: HP-UX 11i Version 3 > Chapter 3 Major Components of HP-UX

Start-up and Shutdown

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Whenever you turn on (or reset) your computer, the hardware, firmware, and software must be initialized in a carefully orchestrated sequence of events known as the boot sequence. A similar sequence, known as the shutdown sequence, refers to the orderly sequence of steps needed to halt HP-UX. The shutdown sequence ensures all running processes are properly stopped and any data in memory that needs to be written to disk is not lost when the operating system is halted and power to the server is lost.

Run Levels

When HP-UX (or any operating system) is up and running, it is said to be booted. When HP-UX is not running, it is said to be halted. HP-UX, like most Unix based operating systems, has several levels of the “booted” state known as run levels. As HP-UX starts up or shuts down, it transitions through the various run levels until it reaches its targeted run level. The various run levels determine what aspects of HP-UX are running.

At boot time, a daemon known as init is started. Its primary role is to create processes from a script stored in the file/etc/inittab (see inittab(4)). The/etc/inittab file is one of the mechanisms used to configure what aspects of HP-UX are running at any given run level. The inittab file can also specify the initial run level that the system will boot to.

The following list describes the general characteristics of each HP-UX run level:

run level 0

When run level 0 is initiated, HP-UX transitions from whatever run level it is currently in through all lower run levels and halts. In the process of transitioning down through the run levels it cleanly terminates all running processes and writes any memory based information to disk, ensuring properly structured file system linkages on disk.

run level s

Also known as single user mode, run level s allows input only from the terminal (or pseudo-terminal) known as the system console. This allows one user, usually a system administrator, to have exclusive access to the server usually for maintenance operations that must be done on a quiescent system.

By default, in run level s, only the root file system is mounted and many subsystems such as the line printer spooling system and networking are not running.

NOTE: There is a different but similar run level known as S (upper-case S). It is functionally the same as run level s (lowercase) with the exception that the capabilities of the true system console are switched to the terminal where you are logged in making it a virtual system console. With modern remote access to a server through its management processor, the distinction between run levels s and S is pretty much semantic.

run level 1

Just above run level s is run level 1. In run level 1 the system is still dedicated to one user but all file systems are mounted and a process known as the syncer is running. The syncer periodically writes any cached memory based file system changes to disk to make sure that the disk based view of a file system’s state matches that of the memory based view of the file system’s state. See sync(1M).

run level 2

Multiuser level. Run level 2 is the first of the run levels that allow multiple users to log in simultaneously from different locations. Run levels 3, 4, 5, and 6 also allow this and each of those run levels adds additional capabilities over all previous run levels.

run level 3

At run level 3, the ability to export NFS file systems is activated. If your server has file systems that you need to access from other servers via NFS mounts, use at least run level 3. Additionally, web-based administration and graphical presentation managers such as CDE begin at run level 3.

run level 4

Currently undefined. Available for user customization.

run level 5

Currently undefined. Available for user customization.

run level 6

Currently undefined. Available for user customization.

NOTE: As described in the previous list, run levels appear to be additive, and based on the default contents of the /etc/inittab file, they generally are. However, it is possible to have processes started at a lower run level that are not available at higher run levels. Each process represented in the /etc/inittab file specifies at which levels it will be active.

Startup and Kill Scripts (Run Level Transitions)

In the past, much more of the system startup process was configured in the/etc/inittab file. Currently, most system services are started and stopped by the /sbin/rc daemon which is called by init each time you change the system run level.

/sbin/rc (the “rc daemon”) performs the following actions:

  1. Runs the script /sbin/rc.utils which is responsible for preparing your system console to display the one-line messages you see on the system console during run level transitions. /sbin/rc.utils also logs output from startup and shutdown scripts to the file /etc/rc.log.

  2. The rc daemon then runs/etc/rc.config which processes all scripts in the directory /etc/rc.config.d. The scripts in /etc/rc.config.d set variables that control the execution of the startup and shutdown scripts that are subsequently run by the rc daemon.

    IMPORTANT: You control what the startup and shutdown (kill) scripts do by setting variables in their corresponding scripts in the /etc/rc.config.d directory.

    Do not edit the scripts in the /sbin/init.d directory directly (These scripts might be replaced during a patch installation or product update, causing you to lose your changes).

  3. Searches the appropriate/sbin/rc#.d directory for scripts to run (and runs them).

    • If the run level being transitioned to is higher than the current run level, then the # in rc#.d represents the run level one higher than the current run level and the scripts in the rc#.d directory with names beginning with “S” are run.

    • If the run level being transitioned to is lower than the current run level, then the # in rc#.d represents the run level one lower than the current run level and the scripts in the rc#.d directory with names beginning with “K” are run.

    This step is repeated for each run level between the current run level and the target run level.

  4. Each startup (or kill) script is first run with the start_msg (or stop_msg) parameter causing it to output its one line message on the system console, and then run again with the start (or stop) parameter to perform its function (based on what variables were previously set from the file in the /etc/rc.config.d directory).

Example 3-4 Run Level Transition Examples

The following two examples show what happens during two typical situations:

Transition up

The file /etc/inittab contains an entry telling init that the initial run level for the system during boot-up should be run level 3:

init:3:initdefault:

To reach run level 3, the system transitions:

  • From run level 0 (the halted state)

  • to run level 1 (running scripts pointed to by links in the /sbin/rc1.d directory whose names begin with the letter S — for example, /sbin/rc1.d/S100localmount, /sbin/rc1.d/S520syncer, and others).

  • to run level 2 (running scripts pointed to by links in the /sbin/rc2.d directory whose names begin with the letter S — for example, /sbin/rc2.d/S500inetd, /sbin/rc2.d/S900samba, and others).

  • and finally to run level 3 (running scripts pointed to by links in the /sbin/rc3.d directory whose names begin with the letter S — for example, /sbin/rc3.d/S823hpws_webmin, /sbin/rc3.d/S823hpws_webproxy, and others).

Transition down

If HP-UX is currently in run level 3 and an system administrator with appropriate privileges executes the command:

/sbin/init 1

The system will transition:

  • From run level 3

  • to run level 2 (running scripts pointed to by links in the /sbin/rc2.d directory whose names begin with the letter K — for example, /sbin/rc2.d/K177hpws_tomcat, and others).

  • and finally to run level 1 (running scripts pointed to by links in the /sbin/rc1.d directory whose names begin with the letter K — for example, /sbin/rc1.d/K500inetd, and others).

Commands for Manipulating System Run Levels

The following commands can be used to set, change, and view HP-UX run levels:

init

init is both a daemon and a command.

The init command interacts with the init daemon. You use the init command to set or change run levels.

The init daemon, started at boot time spawns processes as defined in the /etc/inittab file. These processes in turn control how HP-UX interacts with the outside world (for example, which terminals to accept input from, and whether or not to export local file systems via NFS for use by other servers).

NOTE: If your goal is to transition HP-UX to single-user mode from a higher run level, do not use init s. This could leave processes running and disks mounted that you do not want present.

Use the shutdown command with no parameters to transition to run level s, or to be absolutely certain no undesirable processes or mounted file systems are present, reboot the system to single-user mode by interrupting the boot process and using the secondary boot loader (hpux.efi for Integrity servers or hpux for HP 9000 servers) to override the default run level.

who -r

The -r option of the who command displays the current system run level, the date and time the current run level was entered, and three state fields representing the current run level, how many times that run level was previously entered (since the system was booted), and the previous run level (from which the current run level was entered).

Example:

who -r

. run-level 3 Jun 27 06:22 3 1 4

This output indicates:

  • The system is currently in run level 3.

  • It entered the current run level on June 27th at 22 minutes after six in the morning.

  • The current run level is (3) was previously entered one (1) time since the system was last booted, and the current run level was entered from the previous run level which was four (4).

Starting (Booting) HP-UX

HP-UX based systems go through the following sequential steps when you power them on or reset them:

  1. Hardware and/firmware-based routines on-board the processors and I/O cards perform self-tests and initialize those items along with enough memory to continue the boot process. They also locate and initialize communications with console display and keyboard devices, and a boot device.

  2. Pre-boot firmware/software routines then load and execute the HP-UX boot loader.

  3. The HP-UX boot loader:

    • Locates, opens, and reads the kernel file and copies the kernel into memory

    • Initiates the HP-UX kernel

  4. HP-UX goes through its initialization process and begins normal operation.

For complete details on the HP-UX boot process and its possible variations, see HP-UX System Administrator’s Guide: Routine Management Tasks.

Stopping (Shutting Down) HP-UX

READY . . . SET . . . GO!” As with the famous phrase, there is a definite order that you should follow to shut down your system—or you may encounter problems.

When shutting down an HP-UX system:

  1. First, notify everyone who is likely to be affected by the shutdown, giving them a chance to complete work in progress, and if necessary unmount file systems that were NFS-mounted from your system.

  2. Then, shutdown any programs you might be running that would not be safely shutdown by one of the system’s kill scripts (“Startup and Kill Scripts (Run Level Transitions)”).

  3. Finally, use the shutdown command to shut down the system. The shutdown command:

    1. allows you to notify the users of the system of the shutdown in progress if you have not previously done so, or to remind those users that the shutdown is imminent.

    2. transitions backward through the run levels (executing the kill links in the directories /sbin/rc[0-4].d)

    3. and finally calls reboot() to perform a sync() operation that insures memory structures are written to disk before memory is overwritten by the subsequent boot.

For details on the HP-UX shut down process, see HP-UX System Administrator’s Guide: Routine Management Tasks.

Abnormal Shutdowns (System Crashes)

When your system crashes, it is important to know why so that you can take actions to prevent it from happening again. Sometimes, it is easy to determine why: for example, if somebody trips over the cable connecting your computer to the disk containing your root file system (disconnecting the disk).

At other times, the cause of the crash might not be so obvious. In extreme cases, you might want or need to analyze a snapshot of the computer’s memory at the time of the crash, or have HP do it for you, in order to determine the cause of the crash.

Overview of the Dump / Save Cycle

When the system crashes, in order to preserve the evidence of what caused the crash, HP-UX tries to save the image of physical memory, or certain portions of it, to predefined locations called dump devices. When the system is subsequently rebooted, a special utility copies the memory image from the dump devices to the HP-UX file system area.

Figure 3-8 The Crash Dump Sequence

The Crash Dump Sequence

When the memory image is in the HP-UX file system, you can analyze it with a debugger or save it to removable media for shipment to someone else for analysis.

There are multiple ways that dump devices can be configured:

  • In the kernel

  • During system initialization when the initialization script for crashconf runs (and reads entries from the /etc/fstab file)

  • During run time, by an operator or administrator manually running the /sbin/crashconf command

Preparing for a System Crash

The dump process exists so that you have a way of capturing what your system was doing at the time of a crash. This is not for recovery purposes; processes cannot resume where they left off following a system crash. Rather, this is for analysis purposes in order to help you determine why the system crashed and hopefully prevent it from happening again.

If you want to be able to capture the memory image of your system when a crash occurs (for later analysis), you need to define in advance the location(s) where HP-UX will put that image at the time of the crash. This location can be on local disk devices or logical volumes.

Wherever you decide that HP-UX should put the dump, it is important to have enough space at the dump location (see “How Much Dump Space You Need”). If you do not have enough space, not every page selected to be dumped will be saved and you might not capture the part of memory that contains the instruction or data that caused the crash.

If necessary, you can define more than one dump device so that if the first one fills up, the next one is used to continue the dumping process until the dump is complete or no more defined space is available. Beginning with HP-UX 11i version 3 you can even configure multiple dump devices to be written to in parallel (rather than one after the other), significantly cutting down dump times.

How Much Dump Space You Need

To guarantee that you have enough dump space, define a dump area that is at least as big as your computer’s physical memory, plus one megabyte. If you are doing a selective dump (which is the default dump mode in most cases), much less dump space will actually be needed. Full dumps require dump space equal to the size of your computer’s memory plus a little extra for header information.

In HP-UX Release 11i compressed dumps are enabled by default, however, dump compression will only occur if conditions in the crash environment are favorable. Do not plan your dump storage space based on potential compression; allow enough space for an uncompressed full or selective dump. For more information on compressed dumps, see “Compressed Dumps”.

Dump Configuration Decisions

As computers continue to grow in speed and processing power, they also tend to grow in physical memory size. Where once a system with 256MB of memory was considered to be a huge system, today it is barely adequate for most tasks. Some of today’s HP-UX systems can have terabytes of memory. This is important to consider because the larger the size of your computer’s physical memory the longer it will take to dump its contents following a system crash (and the more disk space the dump will consume).

Usually, when your system crashes it is important to get it back up and running as fast as possible. If your computer has a very large amount of memory, the time it takes to dump that memory to disk might be unacceptably long when you are trying to get the system back up quickly. And, if you happen to already know why the computer crashed (for example, if somebody accidently disconnected the wrong cable), there’s little or no need for a dump anyway.

With HP-UX, a runtime dump subsystem gives you a lot more control over the dump process. With it you can override dump definitions configured into the kernel while the system is running. An operator at the system console can even override the runtime configuration as the system is crashing.

You have control over the following crash dump features:

  • Which classes of memory get dumped.

  • Run-time crash dump configuration. It is no longer necessary to build your dump configuration into the kernel file or to reboot the system to change the crash dump configuration.

  • Whether or not a dump is compressed.

These capabilities give you a lot of flexibility, but you need to make some important decisions regarding how you will configure your system dumps.

There are three main criteria to consider. Select which of these is most important to you and read the corresponding section. The criteria are:

System Recovery Time

Use this section if your most important criterion is to get your system back up and running as soon as possible. The factors you have to consider here are:

Dump Level: Full Dump, Selective Dump, or No Dump

In addition to being able to choose “dump everything” or “dump nothing,” you have the ability to determine which classes of memory pages get dumped, allowing you to capture important memory structures without having to dump the whole contents of memory.

You are reading this section because system recovery time is critical to you. Obviously, the fewer pages your system needs to dump to disk (and on reboot copy to the HP-UX file system area), the faster your system can be returned to service. Therefore, when system recovery time is critical avoid using the full dump option.

When you define dump devices, whether in a kernel build or at run time, you can list which classes of memory must always get dumped and which classes of memory should not be dumped. If you leave both of these lists empty HP-UX will decide for you which parts of memory should be dumped based on what type of error occurred. In nearly all cases it is best to let HP-UX determine which pages to dump.

IMPORTANT: You can interrupt the dump at any time by pressing the ESC (escape) key. It can take as long as 15 seconds to abort.

If you interrupt a dump, it will be as though a dump never occurred; that is, you will not get a partial dump.

Even if you have defined that you do not want a full dump to be performed, an operator at the system console at the time of a crash can override those definitions and request a full dump.

Likewise, if at the time of a crash you know what caused it (and therefore do not need the system dump) but have previously defined a full or selective dump an operator at the system console at the time of a crash can override those definitions and request that no dump be performed.

Concurrent Dumps

On servers with very large amounts of memory, the process of writing memory contents to disk can take a very long time. If you have multiple devices configured to receive the memory dump you can configure HP-UX to split the task of dumping memory and write to the multiple devices in parallel. This process is called dump concurrency and is configured using either the kernel tunable dump_concurrent_on (see dump_concurrent_on(5)), or the crash-processing configuration command crashconf (see crashconf(1M)).

NOTE: Concurrent dump performance improvements are not likely to occur on systems with only one instance of any of the crash dump resources (for example, only one dump device or only one core). And, concurrent dump performance improvements are currently supported only on HP Integrity servers.
Compressed Dumps

Following a system crash, the HP-UX operating system can use this feature to compress data from memory before it writes the data to the dump device. Compression decreases the volume of crash data, making the dump times faster.

By reducing the time required to store the entire dump the recovery period is shorter and your system can be returned to service much sooner. Dump compression provides a greater time saving on systems that have large amounts of memory.

  • Dump compression is not forced, it is only a user request that will be honored if possible.

    At the time of a system crash the dump subsystem examines the state of the system and its resources to determine whether it is possible to use compression. Depending on the resources available, HP-UX determines dynamically whether to dump using compressed or uncompressed format.

    (For example if the processor that is processing the crash fails to assign a sufficient number of processors to do the compression, the dump will not be compressed. A recursive crash, such as a panic during the processing of a previous dump, also causes the system to dump using uncompressed format.)

  • For selective dumps that exclude unused pages, you can expect the dump to take about one-third the time of uncompressed dumps on the same server. This interval includes the time required to run the savecrash program and write the dump to its final storage location on the HP-UX file system. A dump that previously took 3 hours to complete should now take only about an hour.

  • You can use the crashconf command (see crashconf(1M)) to disable or enable compressed dumps. (Compression is configured into the kernel by default.) During a crash event you can also choose to override the previously defined dump compression setting.

    Normally, there is no benefit in disabling compression unless the initial (compressed) dump is corrupt and you want to attempt an uncompressed dump on a subsequent crash event. Compressed saves (to the HP-UX file system area) are only possible with sequential dumps.

  • Using the command crashutil, you can convert the compressed dump file to any of several dump formats for storage and analysis. See the manpage crashutil(1M) for detailed information on how to do this and what dump formats are available.

  • A compressed dump file requires less disk storage space and creates a smaller tar file that takes less time to copy to tape or transmit for analysis, for example via ftp)

  • If your server uses virtual partitions (vPars), the dump might not be compressed but the dump process will proceed.

  • If more than one crash occurs in close succession, it might not be possible for HP-UX to compress the dump.

Compressed Save versus Noncompressed Save

System dumps can be very large, so large that your ability to store them in your HP-UX file system area can be taxed.

The boot time utility called savecrash can be configured (by editing the file /etc/rc.config.d/savecrash) to compress or not compress the data as it copies the memory image from the dump devices to the HP-UX file system area during the reboot process. This has system recovery time implications in that compressing the data can take longer if the saving occurs as foreground processing (for example, when HP-UX is trying to quickly evacuate a dump device that is also used for paging). So, if you have the disk space and require that your system be back up and running as quickly as possible, configure savecrash to not compress the data.

Using a Device for Both Paging and Dumping (System Recovery Time)

It is possible to use a specific device for both paging (swap space) and as a dump device. However, if system recovery time is critical to you do not configure the primary paging device as a dump device. From the savecrash(1M) manpage:

  • By default, when the primary paging device is not used as one of the dump devices or after the crash image on the primary paging device has been saved, savecrash runs in the background. This reduces system boot-up time by allowing the system to be run with only the primary paging device.

Another advantage to keeping your paging and dump devices separate is that paging will not overwrite information stored on a dump device, no matter how long the system has been up or how much activity has taken place. Therefore, you can prevent savecrash processing at boot time (by editing the file /etc/rc.config.d/savecrash). This can save you a lot of time at boot time by allowing you to save the memory image after the server has been returned to service. After the server is up and running you can run savecrash manually to copy the memory image from the dump area to the HP-UX file system area.

Partial Saves

If a memory dump resides partially on dedicated dump devices and partially on devices that are also used for paging, you can choose to save (to the HP-UX file system) only those pages that are endangered by paging activity. Pages residing on the dedicated dump devices can remain there. If you know how to analyze memory dumps, it is even possible to analyze them directly from the dedicated dump devices using a debugger that supports this feature.

Before sending your memory dump to someone else for analysis you must move the dumped pages from the dedicated dump devices to the HP-UX file system. You can then use a utility such as pax or tar to bundle them up for shipment.

Crash Information Integrity

Use this section if your most important criterion is to make sure you capture the part of memory that contains the instruction or piece of data that caused crash. The factors you have to consider here are:

Full Dump versus Selective Dump

You have chosen this section because it is critical to you to capture the specific instruction or piece of data that caused your system crash. The only way to guarantee that you have it is to capture everything. This means selecting to do a full dump of memory.

Be aware, however, that this can be costly from both a time and a disk space perspective. From the time perspective, it can take quite a while to dump the entire contents of memory from an HP-UX instance using very large amounts of memory. It can take an additional large amount of time to copy that memory image to the HP-UX file system area during the reboot process.

From the disk space perspective, if you have large amounts of memory (some HP-UX servers can have terabytes of memory), you will need an amount of dump area at least equal to the amount of memory in your system; and, depending on a number of factors, you will need additional disk space in your HP-UX file system area equaling the amount of physical memory in your system, in the worst case.

Dump Definitions Built into the Kernel

You can configure HP-UX dump devices using one or more of the following methods:

  • Preferred Method: At run time using the /sbin/crashconf command

  • At boot time (entries defined in the /etc/fstab file)

  • During kernel configuration (put the definitions in the /stand/system file). This method is obsolescent and should no longer be used!

Definitions at each of these places add to or replace any previous definitions from the other sources. However, consider the following situation:

Example 3-5 Example of a Crash During the Very Early Stages of the Boot Process

Consider a server that has ten gigabytes (10 GB) of physical memory. If you were to define system dump devices with a total of two gigabytes (2 GB) of space in the kernel file, and then define an additional nine gigabytes (9 GB) of disk space in the /etc/fstab file, you would have enough dump space to hold the entire memory image (a full dump) by the time the system was fully up and running.

But, what if a crash occurs before /etc/fstab is processed? Only the amount of dump space already configured will be available at the time of the crash; in this example, two gigabytes of space.

If it is critical to you to capture every byte of memory in all instances, including the early stages of the boot process, use crashconf with the -s option (which tells crashconf to retain dump device definitions across reboots) to define enough dump space in advance to account for this. crashconf is the preferred method for defining dump devices in HP-UX 11i version 3.

NOTE: The preceding example is presented for completeness. The actual amount of time between the point where kernel dump devices are activated, and the point where runtime dump devices are activated is very small (a few seconds), so the window of vulnerability for this situation is very small.
Using a Device for Both Paging and Dumping (Crash Integrity)

It is possible to use a specific device for both paging purposes and as a dump device. But, if crash dump integrity is critical to you, this is not recommended. From the savecrash(1M) manpage:

  • If savecrash determines that a dump device is already enabled for paging, and that paging activity has already taken place on that device, a warning message will indicate that the dump may be invalid. If a dump device has not already been enabled for paging, savecrash prevents paging from being enabled to the device by creating the file /etc/savecore.LCK. swapon does not enable the device for paging if the device is locked in /etc/savecore.LCK.

So, if possible, avoid using a given device for both paging and dumping: particularly the primary paging device!

HP-UX systems configured with small amounts of memory and using only the primary swap device as a dump device are in danger of not being able to preserve the dump (copy it to the HP-UX file system area) before paging activity destroys the data in the dump area. HP-UX systems configured with larger amounts of memory are less likely to need paging (swap) space during startup, and are therefore less likely to destroy a memory dump on the primary paging device before it can be copied.

Disk Space Needs

Use this section if you have very limited disk resources for the post-crash dump and/or the post-reboot save of the memory image to the HP-UX file system area. The factors you have to consider here are:

Dump Level

You are reading this section because disk space is a limited resource on your server. Obviously, the fewer pages that you have to dump, the less space is required to hold them. Therefore, unless your server also has a small amount of physical memory, a full dump is not recommended. If disk space is very limited, you can always choose no dump at all.

However, there is a happy medium, and it happens to be the default dump behavior, which is called a selective dump. HP-UX does a pretty good job of determining which pages of memory are the most critical for a given type of crash, and saves only those. By choosing this option, you can save a lot of disk space on your dump devices, and again later, in your HP-UX file system area. For instructions on how to do this, see HP-UX System Administrator’s Guide: Routine Management Tasks.

Compressed Save versus Non-compressed Save

Regardless of whether you choose to do a full or selective save, whatever is saved on the dump devices usually needs to be copied to your HP-UX file system area before you can use it.

If the disk space shortage on your system is in the HP-UX file system area (not in the dump devices), you can choose to have savecrash (the boot time utility that does the copy) compress your data as it makes the copy.

Partial Save (savecrash -p)

If you have plenty of dump device space but are limited on space in your HP-UX file system, you can use the -p option to the savecrash command. This command copies only those pages on dump devices that are endangered by paging activity (the pages residing on devices that are being used for both paging and dumping). Pages that are on dedicated dump devices are not copied.

NOTE: It is possible to analyze a crash dump directly from dump devices using a debugger that supports this feature[7]. But, if you need to save it to removable media or send it to someone you will first need to copy the memory image to the HP-UX file system area.
For More Information on Defining Dump Devices

The following resources have additional information on defining dump devices:

  • HP-UX System Administrator's Guide: Routine Tasks (Chapter 2: Booting and Shutdown)

  • The manpage crashconf(1M) describes the primary command used to configure crash dumps.

  • The manpage savecrash(1M) describes the various options for saving crash dumps to a file system area for later analysis or archiving.

  • The manpage crashutil(1M) describes the utility for converting crash dumps into various formats for later analysis. Similar to savecrash, crashutil can also be used to retrieve crash dump information from raw dump devices into the HP-UX file system area.

What Happens When the System Crashes

An HP-UX system crash (system panic) is an unusual event. When a panic occurs, it means that HP-UX encountered a condition that it did not know how to handle (or could not handle). Sometimes you know right away what caused the crash. Other times the cause is not readily apparent. It is for this reason that HP-UX is equipped with a dump procedure to capture the contents of memory at the time of the crash for later analysis.

You define in advance:

  • Where you want memory contents dumped (dump devices)

  • Whether or not you want the dump to be compressed to save space on your dump devices (dump compression)

  • Whether or not to dump to multiple devices in parallel to save time, allowing (dump concurrency)

Use the /sbin/crashconf command to configure these options. See the crashconf(1M) for details on how to configure the various options.

Operator Override Options

When a HP-UX panics, the current dump control option settings are displayed at the system console during a crash. You then have 10 seconds to interact with the system console before the current settings are used to proceed with dump processing.

If you choose to interact with the system during the 10-second override period, follow the on-screen prompts.

You can choose to do the following:

C option

[CURRENT] Proceed with the current settings. Use this to immediately proceed with the current settings, not waiting for the 10-second override period to automatically expire.

S option

[SELECTIVE] Proceed with a selective dump with both compression and concurrency turned off, regardless of what was previously configured.

F option

[FULL DUMP] This option is available if enough dump space has been configured to hold the contents of the entire physical memory. Select this option to dump the contents of all physical memory. With this option, compression and concurrency are off.

P option

[PARTIAL DUMP] This option is available instead of the full dump option if there is not enough dump space configured to hold a full dump. The amount of memory that will be dumped is displayed on the console. With this option, compression and concurrency are off.

N option

[NO DUMP] Do not perform a dump. Immediately reboot the system. Use this option if you know the cause of the panic and do not need a dump.

The Dump

After the operator is given a chance to override the current dump level, or the 10-second override period expires, HP-UX writes the physical memory contents to the dump devices until one of the following conditions is true:

  • The entire contents of memory are dumped (if a full dump was configured or requested by the operator)

  • The entire contents of selected memory pages are dumped (if a selective dump was configured or requested by the operator)

  • Configured dump device space is exhausted

Depending on the amount of memory being dumped, and a number of other factors, this process can take from a few seconds to hours.

While the dump is in occurring, status messages on the system console indicate the dump’s progress.

IMPORTANT: You can interrupt the dump at any time by pressing the ESC (escape) key. It can take as long as 15 seconds to abort.

If you interrupt a dump, it will be as though a dump never occurred; that is, you will not get a partial dump.

Following the dump, the system attempts to reboot.

The Reboot

After the dumping of physical memory pages is complete, the system attempts to reboot (if the AUTOBOOT flag is set). For information on the AUTOBOOT flag, see HP-UX System Administrator’s Guide: Routine Management Tasks.

The savecrash Processing Option

You can define whether or not you want a process called savecrash to run as your system boots. This process copies (and optionally compresses) the memory image stored on the dump devices to the HP-UX file system area. Space permitting, you can store multiple crash dumps in the file system area in case there is more than one panic event. If you do not run savecrash during or shortly after boot, you risk only having the latest dump available, on the dump devices.

Dual-Mode Devices (dump / swap)

By default, savecrash is enabled and performs its copy during the boot process. You can disable this operation by editing the /etc/rc.config.d/savecrash file, setting the SAVECRASH environment variable to a value of 0. This is generally safe to do if your dump devices are not also being used as paging devices.

From the savecrash(1M) manpage:

  • If there is insufficient space in the file system for the portions of the crash dump that need to be saved, savecrash will save as much as will fit in the available space. (Priority is given to the index file, then to the kernel module files, and then to the physical memory image.) The dump will be considered saved, and savecrash will not attempt to save it again, unless there was insufficient space for any of the physical memory image. (See the description of option -r.)

The -r option to savecrash allows you to resave a crash that has already been marked as saved. If a save fails (or if only a partial save was made) due to lack of file system space, you have a chance, once the system is running again, to clean up the file system in order to gain the space you need for the savecrash operation; or you can run the savecrash command manually, specifying an alternate destination for the saved data.

CAUTION: If you are using your devices for both paging and dumping, do not disable the savecrash boot processing or you will lose the dumped memory image to subsequent system paging activity.

What to Do After the System Has Rebooted

After your system is rebooted, one of the first things you need to do is to be sure that the physical memory image that was dumped to the dump devices is copied to the HP-UX file system area so that you can either package it up and send it to an expert for analysis, or analyze it yourself using a debugger.

NOTE: It is possible to analyze a crash dump directly from dump devices using a debugger that supports this feature. But if you need to save it to removable media, or send it to someone, you first need to copy the memory image to the HP-UX file system area.

Unless you specifically disable savecrash processing during reboot, the savecrash utility will copy the memory image for you during the reboot process. The default HP-UX directory that it will put the memory image in is /var/adm/crash. You can specify a different location by editing the file /etc/rc.config.d/savecrash and setting the environment variable called SAVECRASH_DIR to the name of the directory where you would like the dumps to be located. Just be sure the destination has enough disk space to hold the copied memory image.



[7] Analyzing crash dumps is not a trivial task. It requires intimate knowledge of HP-UX internal structures and the use of debuggers. It is beyond the scope of this document to cover the actual analysis process. If you need help analyzing a crash dump, contact your HP representative.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2008 Hewlett-Packard Development Company, L.P.