[Chapter 8] 8.4 trap

8.4 trap

We've been discussing how signals affect the casual user; now let's talk a bit about how shell programmers can use them. We won't go into too much depth about this, because it's really the domain of systems programmers.

We mentioned above that programs in general can be set up to "trap" specific signals and process them in their own way. The trap built-in command lets you do this from within a shell script. trap is most important for "bullet-proofing" large shell programs so that they react appropriately to abnormal events-just as programs in any language should guard against invalid input. It's also important for certain systems programming tasks, as we'll see in the next chapter.

The syntax of trap is:

trap cmd sig1 sig2 ...

That is, when any of sig1 , sig2 , etc., are received, run cmd , then resume execution. After cmd finishes, the script resumes execution just after the command that was interrupted. [10]

[10] This is what usually happens. Sometimes the command currently running will abort (sleep acts like this, as we'll see soon); other times it will finish running. Further details are beyond the scope of this book.

Of course, cmd can be a script or function. The sig s can be specified by name or by number. You can also invoke trap without arguments, in which case the shell will print a list of any traps that have been set, using symbolic names for the signals.

Here's a simple example that shows how trap works. Suppose we have a shell script called loop with this code:

while true; do
    sleep 60
done

This will just pause for 60 seconds (the sleep (1) command) and repeat indefinitely. true is a "do-nothing" command whose exit status is always 0. [11] Try typing in this script. Invoke it, let it run for a little while, then type [CTRL-C] (assuming that is your interrupt key). It should stop, and you should get your shell prompt back.

[11] Actually, it's a built-in alias for : , the real shell "no-op."

Now insert the following line at the beginning of the script:

trap 'print \'You hit control-C!\'' INT

Invoke the script again. Now hit CTRL-C . The odds are overwhelming that you are interrupting the sleep command (as opposed to true ). You should see the message "You hit control-C!", and the script will not stop running; instead, the sleep command will abort, and it will loop around and start another sleep . Hit CTRL-\ to get it to stop. Type rm core to get rid of the resulting core dump file.

Next, run the script in the background by typing loop & . Type kill %loop (i.e., send it the TERM signal); the script will terminate. Add TERM to the trap command, so that it looks like this:

trap 'print \'You hit control-C!\'' INT TERM

Now repeat the process: run it in the background and type kill %loop . As before, you will see the message and the process will keep on running. Type kill -KILL %loop to stop it.

Notice that the message isn't really appropriate when you use kill . We'll change the script so it prints a better message in the kill case:

trap 'print \'You hit control-C!\'' INT
trap 'print \'You tried to kill me!\'' TERM

while true; do
    sleep 60
done

Now try it both ways: in the foreground with [CTRL-C] and in the background with kill . You'll see different messages.

8.4.1 Traps and Functions

The relationship between traps and shell functions is straightforward, but it has certain nuances that are worth discussing. The most important thing to understand is that functions can have their own local traps; these aren't known outside of the function. In particular, the surrounding script doesn't know about them. Consider this code:

function settrap {
    trap 'print \'You hit control-C!\'' INT
}

settrap
while true; do
    sleep 60
done

If you invoke this script and hit your interrupt key, it will just exit. The trap on INT in the function is known only inside that function. On the other hand:

function loop {
    trap 'print \'How dare you!\'' INT
    while true; do
        sleep 60
    done
}

trap 'print \'You hit control-C!\'' INT
loop

When you run this script and hit your interrupt key, it will print "How dare you!". But how about this:

function loop {
    while true; do
        sleep 60
    done
}

trap 'print \'You hit control-C!\'' INT
loop
print 'exiting...'

This time the looping code is within a function, and the trap is set in the surrounding script. If you hit your interrupt key, it will print the message and then print "exiting...". It will not repeat the loop as above.

Why? Remember that when the signal comes in, the shell aborts the current command, which in this case is a call to a function. The entire function aborts, and execution resumes at the next statement after the function call.

The advantage of traps that are local to functions is that they allow you to control a function's behavior separately from the surrounding code.

Yet you may want to define global traps inside functions. There is a rather kludgy way to do this; it depends on a feature that we introduce in the next chapter, which we call a "fake signal." Here is a way to set trapcode as a global trap for signal SIG inside a function:

trap "trap trapcode SIG
" EXIT

This sets up the command trap trapcode SIG to run right after the function exits, at which time the surrounding shell script is in scope (i.e., is "in charge"). When that command runs, trapcode is set up to handle the SIG signal.

For example, you may want to reset the trap on the signal you just received, like this:

function trap_handler {
    trap "trap second_handler INT" EXIT
    print 'Interrupt: one more to abort.'
}

function second_handler {
    print 'Aborted.'
    exit 
}

trap trap_handler INT

This code acts like the UNIX mail utility: when you are typing in a message, you must press your interrupt key twice to abort the process.

Speaking of mail , now we'll show a more practical example of traps.

Task 8.2

As part of an electronic mail system, write the shell code that lets a user compose a message.

The basic idea is to use cat to create the message in a temporary file and then hand the file's name off to a program that actually sends the message to its destination. The code to create the file is very simple:

msgfile=/tmp/msg$$
cat > $msgfile

Since cat without an argument reads from the standard input, this will just wait for the user to type a message and end it with the end-of-text character [CTRL-D] .

8.4.2 Process ID Variables and Temporary Files

The only thing new about this is $$ in the filename expression. This is a special shell variable whose value is the process ID of the current shell.

To see how $$ works, type ps and note the process ID of your shell process (ksh ). Then type print " $$ " ; the shell will respond with that same number. Now type ksh to start a subshell, and when you get a prompt, repeat the process. You should see a different number, probably slightly higher than the last one.

A related built-in shell variable is ! (i.e., its value is $! ), which contains the process ID of the most recently invoked background job. To see how this works, invoke any job in the background and note the process ID printed by the shell next to [1] . Then type print " $! " ; you should see the same number.

The ! variable is useful in shell programs that involve multiple communicating processes, as we'll see later.

To return to our mail example: since all processes on the system must have unique process IDs, $$ is excellent for constructing names of temporary files. We saw an example of this back in Chapter 2, Command-line Editing : we used the expression .hist$$ as a way of generating unique names for command history files so that several can be open at once, allowing multiple shell windows on a workstation to have their own history files. This expression generates names like .hist234 . There are also examples of $$ in Chapter 7 and Chapter 9, Debugging Shell Programs .

The directory /tmp is conventionally used for temporary files. Many systems also have another directory, /usr/tmp , for the same purpose. All files in these directories are usually erased whenever the computer is rebooted.

Nevertheless, a program should clean up such files before it exits, to avoid taking up unnecessary disk space. We could do this in our code very easily by adding the line rm $msgfile after the code that actually sends the message. But what if the program receives a signal during execution? For example, what if a user changes his or her mind about sending the message and hits CTRL-C to stop the process? We would need to clean up before exiting. We'll emulate the actual UNIX mail system by saving the message being written in a file called dead.letter in the current directory. We can do this by using trap with a command string that includes an exit command:

trap 'mv $msgfile dead.letter; exit' INT TERM
msgfile=/tmp/msg$$
cat > $msgfile
# send the contents of $msgfile to the specified mail address...
rm $msgfile

When the script receives an INT or TERM signal, it will remove the temp file and then exit. Note that the command string isn't evaluated until it needs to be run, so $msgfile will contain the correct value; that's why we surround the string in single quotes.

But what if the script receives a signal before msgfile is created-unlikely though that may be? Then mv will try to rename a file that doesn't exist. To fix this, we need to test for the existence of the file $msgfile before trying to delete it. The code for this is a bit unwieldy to put in a single command string, so we'll use a function instead:

function cleanup {
    if [[ -a $msgfile ]]; then
	  mv $msgfile dead.letter
    fi
    exit
}

trap cleanup INT TERM

msgfile=/tmp/msg$$
cat > $msgfile
# send the contents of $msgfile to the specified mail address...
rm $msgfile

8.4.3 Ignoring Signals

Sometimes a signal comes in that you don't want to do anything about. If you give the null string (" " or ' ' ) as the command argument to trap , then the shell will effectively ignore that signal. The classic example of a signal you may want to ignore is HUP (hangup), the signal the shell sends to all of your background processes when you log out.

HUP has the usual default behavior: it will kill the process that receives it. But there are bound to be times when you don't want a background job to terminate when you log out. For example, you may start a long compile or word processing job; you want to log out and come back later when you expect the job to be finished. Under normal circumstances, your background job will terminate when you log out. But if you run it in a shell environment where the HUP signal is ignored, the job will finish.

To do this, you could write a simple function that looks like this:

function ignorehup {
    trap "" HUP
    eval "$@"
}

We write this as a function instead of a script for reasons that will become clearer when we look in detail at subshells at the end of this chapter.

Actually, there is a UNIX command called nohup that does precisely this. The start script from the last chapter could include nohup :

eval nohup "$@" > logfile 2>&1 &

This prevents HUP from terminating your command and saves its standard and error output in a file. Actually, the following is just as good:

nohup "$@" > logfile 2>&1 &

If you understand why eval is essentially redundant when you use nohup in this case, then you have a firm grasp on the material in the previous chapter.

8.4.4 Resetting Traps

Another "special case" of the trap command occurs when you give a dash (- ) as the command argument. This resets the action taken when the signal is received to the default, which usually is termination of the process.

As an example of this, let's return to Task 8-2, our mail program. After the user has finished sending the message, the temporary file is erased. At that point, since there is no longer any need to "clean up," we can reset the signal trap to its default state. The code for this, apart from function definitions, is:

trap abortmsg INT
trap cleanup TERM

msgfile=/tmp/msg$$
cat > $msgfile
# send the contents of $msgfile to the specified mail address...
rm $msgfile

trap - INT TERM

The last line of this code resets the handlers for the INT and TERM signals.

At this point you may be thinking that one could get seriously carried away with signal handling in a shell script. It is true that "industrial strength" programs devote considerable amounts of code to dealing with signals. But these programs are almost always large enough so that the signal-handling code is a tiny fraction of the whole thing. For example, you can bet that the real UNIX mail system is pretty darn bullet-proof.

However, you will probably never write a shell script that is complex enough, and that needs to be robust enough, to merit lots of signal handling. You may write a prototype for a program as large as mail in shell code, but prototypes by definition do not need to be bullet-proofed.

Therefore, you shouldn't worry about putting signal-handling code in every 20-line shell script you write. Our advice is to determine if there are any situations in which a signal could cause your program to do something seriously bad and add code to deal with those contingencies. What is "seriously bad"? Well, with respect to the above examples, we'd say that the case where HUP causes your job to terminate on logout is seriously bad, while the temporary file situation in our mail program is not.

The Korn shell has several new options to trap (with respect to the same command in most Bourne shells) that make it useful as an aid for debugging shell scripts. We'll cover these in the next chapter.