[Chapter 8] 8.3 Signals

8.3 Signals

We mentioned earlier that typing CTRL-Z to suspend a job is similar to typing CTRL-C to stop a job, except that you can resume the job later. They are actually similar in a deeper way: both are particular cases of the act of sending a signal to a process.

A signal is a message that one process sends to another when some abnormal event takes place or when it wants the other process to do something. Most of the time, a process send a signal to a subprocess it created. You're undoubtedly already comfortable with the idea that one process can communicate with another through an I/O pipeline; think of a signal as another way for processes to communicate with each other. (In fact, any textbook on operating systems will tell you that both are examples of the general concept of interprocess communication , or IPC.) [6]

[6] Pipes and signals were the only IPC mechanisms in early versions of UNIX. More modern versions like System V and 4.x BSD have additional mechanisms, such as sockets, named pipes, and shared memory. Named pipes are accessible to shell programmers through the mknod(1) command, which is beyond the scope of this book.

Depending on the version of UNIX, there are two or three dozen types of signals, including a few that can be used for whatever purpose a programmer wishes. Signals have numbers (from 1 to the number of signals the system supports) and names; we'll use the latter. You can get a list of all the signals on your system, by name and number, by typing kill -l . Bear in mind, when you write shell code involving signals, that signal names are more portable to other versions of UNIX than signal numbers.

8.3.1 Control-key Signals

When you type CTRL-C , you tell the shell to send the INT (for "interrupt") signal to the current job; [CTRL-Z] sends TSTP (on most systems, for "terminal stop"). You can also send the current job a QUIT signal by typing CTRL-\ (control-backslash); this is sort of like a "stronger" version of [CTRL-C] . [7] You would normally use [CTRL-] when (and only when) [CTRL-C] doesn't work.

[7] [CTRL-] \ can also cause the shell to leave a file called core in your current directory. This file contains an image of the process to which you sent the signal; a programmer could use it to help debug the program that was running. The file's name is a (very) old-fashioned term for a computer's memory. Other signals leave these "core dumps" as well; you should feel free to delete them unless a systems programmer tells you otherwise.

As we'll see soon, there is also a "panic" signal called KILL that you can send to a process when even [CTRL-] doesn't work. But it isn't attached to any control key, which means that you can't use it to stop the currently running process. INT, TSTP, and QUIT are the only signals you can use with control keys. [8]

[8] Some BSD-derived systems have additional control-key signals.

You can customize the control keys used to send signals with options of the stty (1) command. These vary from system to system-consult your man page for the command-but the usual syntax is stty signame char . signame is a name for the signal that, unfortunately, is often not the same as the names we use here. Table 1.7 in Chapter 1, Korn Shell Basics lists stty names for signals found on all versions of UNIX. char is the control character, which you can give in the same notation we use. For example, to set your INT key to [CTRL-X] on most systems, use:

stty intr ^X

Now that we've told you how to do this, we should add that we don't recommend it. Changing your signal keys could lead to trouble if someone else has to stop a runaway process on your machine.

Most of the other signals are used by the operating system to advise processes of error conditions, like a bad machine code instruction, bad memory address, or division by zero, or "interesting" events such as a user logging out or a timer ("alarm") going off. The remaining signals are used for esoteric error conditions that are of interest only to low-level systems programmers; newer versions of UNIX have more and more arcane signal types.

8.3.2 kill

You can use the built-in shell command kill to send a signal to any process you created-not just the currently running job. kill takes as argument the process ID, job number, or command name of the process to which you want to send the signal. By default, kill sends the TERM ("terminate") signal, which usually has the same effect as the INT signal that you send with [CTRL-C] . But you can specify a different signal by using the signal name (or number) as an option, preceded by a dash.

kill is so-named because of the nature of the default TERM signal, but there is another reason, which has to do with the way UNIX handles signals in general. The full details are too complex to go into here, but the following explanation should suffice.

Most signals cause a process that receives them to roll over and die; therefore if you send any one of these signals, you "kill" the process that receives it. However, programs can be set up to "trap" specific signals and take some other action. For example, a text editor would do well to save the file being edited before terminating when it receives a signal such as INT, TERM, or QUIT. Determining what to do when various signals come in is part of the fun of UNIX systems programming.

Here is an example of kill . Say you have a fred process in the background, with process ID 480 and job number 1, that needs to be stopped. You would start with this command:

$ kill %1

If you were successful, you would see a message like this:

[1] + Terminated                fred &

If you don't see this, then the TERM signal failed to terminate the job. The next step would be to try QUIT:

$ kill -QUIT %1

If that worked, you would see these messages:

fred[1]: 480 Quit(coredump)
[1] +  Done(131)                fred &

The 131 is the exit status returned by fred . [9] But if even QUIT doesn't work, the "last-ditch" method would be to use KILL:

[9] When a shell script is sent a signal, it exits with status 128+N , where N is the number of the signal it received (128 changes to 256 in future releases). In this case, fred is a shell script, and QUIT happens to be signal number 3.

$ kill -KILL %1

(Notice how this has the flavor of "yelling" at the runaway process.) This produces the message:

[1] + Killed                    fred &

It is impossible for a process to "trap" a KILL signal-the operating system should terminate the process immediately and unconditionally. If it doesn't, then either your process is in one of the "funny states" we'll see later in this chapter, or (far less likely) there's a bug in your version of UNIX.

Here's another example.

Task 8.1

Write a script called killalljobs that kills all background jobs.

The solution to this task is simple, relying on jobs -p :

kill "$@" $(jobs -p)

You may be tempted to use the KILL signal immediately, instead of trying TERM (the default) and QUIT first. Don't do this. TERM and QUIT are designed to give a process the chance to "clean up" before exiting, whereas KILL will stop the process, wherever it may be in its computation. Use KILL only as a last resort!

You can use the kill command with any process you create, not just jobs in the background of your current shell. For example, if you use a windowing system, then you may have several terminal windows, each of which runs its own shell. If one shell is running a process that you want to stop, you can kill it from another window-but you can't refer to it with a job number because it's running under a different shell. You must instead use its process ID.

8.3.3 ps

This is probably the only situation in which a casual user would need to know the ID of a process. The command ps (1) gives you this information; however, it can give you lots of extra information that you must wade through as well.

ps is a complex command. It takes several options, some of which differ from one version of UNIX to another. To add to the confusion, you may need different options on different UNIX versions to get the same information! We will use options available on the two major types of UNIX systems, those derived from System V (such as most of the versions for Intel 386/486 PCs, as well as IBM's AIX and Hewlett-Packard's HP/UX) and BSD (DEC's Ultrix, SunOS). If you aren't sure which kind of UNIX version you have, try the System V options first.

You can invoke ps in its simplest form without any options. In this case, it will print a line of information about the current login shell and any processes running under it (i.e., background jobs). For example, if you invoked three background jobs, as we saw earlier in the chapter, ps on System V-derived versions of UNIX would produce output that looks something like this:

   PID TTY      TIME COMD
   146 pts/10   0:03 ksh
  2349 pts/10   0:03 fred
  2367 pts/10   0:17 bob
  2389 pts/10   0:09 dave
  2390 pts/10   0:00 ps

The output on BSD-derived systems looks like this:

   PID TT STAT  TIME COMMAND
   146 10 S     0:03 /bin/ksh -i
  2349 10 R     0:03 fred
  2367 10 D     0:17 bob -f /dev/rmt0
  2389 10 R     0:09 dave
  2390 10 R     0:00 ps

(You can ignore the STAT column.) This is a bit like the jobs command. PID is the process ID; TTY (or TT) is the terminal (or pseudo-terminal, if you are using a windowing system) the process was invoked from; TIME is the amount of processor time (not real or "wall clock" time) the process has used so far; COMD (or COMMAND) is the command. Notice that the BSD version includes the command's arguments, if any; also notice that the first line reports on the parent shell process, and in the last line, ps reports on itself.

ps without arguments lists all processes started from the current terminal or pseudo-terminal. But since ps is not a shell command, it doesn't correlate process IDs with the shell's job numbers. It also doesn't help you find the ID of the runaway process in another shell window.

To get this information, use ps -a (for "all"); this lists information on a different set of processes, depending on your UNIX version.

8.3.3.1 System V

Instead of listing all of those that were started under a specific terminal, ps -a on System V-derived systems lists all processes associated with any terminal that aren't group leaders. For our purposes, a "group leader" is the parent shell of a terminal or window. Therefore, if you are using a windowing system, ps -a lists all jobs started in all windows (by all users), but not their parent shells.

Assume that, in the above example, you have only one terminal or window. Then ps -a will print the same output as plain ps except for the first line, since that's the parent shell. This doesn't seem to be very useful.

But consider what happens when you have multiple windows open. Let's say you have three windows, all running terminal emulators like xterm for the X Window System. You start background jobs fred , dave , and bob in windows with pseudo-terminal numbers 1, 2, and 3, respectively. This situation is shown in Figure 8.1 .

Figure 8.1: Background jobs in multiple windows

Assume you are in the uppermost window. If you type ps , you will see something like this:

   PID TTY      TIME COMD
   146 pts/1    0:03 ksh
  2349 pts/1    0:03 fred
  2390 pts/1    0:00 ps

But if you type ps -a , you will see this:

   PID TTY      TIME COMD
  2349 pts/1    0:03 fred
  2367 pts/2    0:17 bob
  2389 pts/3    0:09 dave
  2390 pts/1    0:00 ps

Now you should see how ps -a can help you track down a runaway process. If it's dave , you can type kill 2389 . If that doesn't work, try kill -QUIT 2389 , or in the worst case, kill -KILL 2389 .

8.3.3.2 BSD

On BSD-derived systems, ps -a lists all jobs that were started on any terminal; in other words, it's a bit like concatenating the the results of plain ps for every user on the system. Given the above scenario, ps -a will show you all processes that the System V version shows, plus the group leaders (parent shells).

Unfortunately, ps -a (on any version of UNIX) will not report processes that are in certain pathological conditions where they "forget" things like what shell invoked them and what terminal they belong to. Such processes have colorful names ("zombies," "orphans") that are actually used in UNIX technical literature, not just informally by systems hackers. If you have a serious runaway process problem, it's possible that the process has entered one of these states.

Let's not worry about why or how a process gets this way. All you need to understand is that the process doesn't show up when you type ps -a . You need another option to ps to see it: on System V, it's ps -e ("everything"), whereas on BSD, it's ps -ax .

These options tell ps to list processes that either weren't started from terminals or "forgot" what terminal they were started from. The former category includes lots of processes that you probably didn't even know existed: these include basic processes that run the system and so-called daemons (pronounced "demons") that handle system services like mail, printing, network file systems, etc.

In fact, the output of ps -e or ps -ax is an excellent source of education about UNIX system internals, if you're curious about them. Run the command on your system and, for each line of the listing that looks interesting, invoke man on the process name or look it up in the Unix Programmer's Manual for your system.

User shells and processes are listed at the very bottom of ps -e or ps -ax output; this is where you should look for runaway processes. Notice that many processes in the listing have ? instead of a terminal. Either these aren't supposed to have one (such as the basic daemons) or they're runaways. Therefore it's likely that if ps -a doesn't find a process you're trying to kill, ps -e (or ps -ax ) will list it with ? in the TTY (or TT) column. You can determine which process you want by looking at the COMD (or COMMAND) column.