Chapter 9. Debugging Shell Programs
We hope that we have convinced you that
the Korn shell can be used as a serious Unix programming environment.
It certainly has plenty of features, control structures, etc.
But another essential part of a programming environment is a set
of powerful, integrated support tools.
For example,
there is a wide assortment of screen editors, compilers,
debuggers, profilers, cross-referencers, etc., for languages like
C, C++ and Java.
If you program in one of these languages,
you probably take such tools for granted, and you would
undoubtedly cringe at the thought of having to develop code with, say,
the ed editor and the adb machine-language debugger.
But what about programming support tools for the Korn shell? Of
course, you can use any editor you like, including vi and
Emacs. And because the shell is an interpreted language,
you don't need a compiler.[126]
But there are no other tools available. The most serious problem
is the lack of a debugger.
This chapter addresses that lack. The shell does have a few
features that help in debugging shell scripts; we'll see these in the
first part of the chapter.
The Korn shell also has a couple of
new features, not present in most Bourne shells, that make it
possible to implement a full-blown debugging tool.
We show these features; more importantly, we present kshdb,
a Korn shell debugger that uses them. kshdb is basic yet
quite usable, and its implementation serves as an extended example of
various shell programming techniques from throughout this book.
What sort of functionality do you need to debug a program?
At the most empirical level, you need a way of determining what
is causing your program to behave badly and where the
problem is in the code. You usually start with an obvious what
(such as an error message, inappropriate output, infinite loop, etc.),
try to work backwards until you find a what that is closer to
the actual problem (e.g., a variable with a bad value, a bad option
to a command), and eventually arrive at the exact where in your
program. Then you can worry about how to fix it.
Notice that these steps represent a process of starting with obvious
information and ending up with often obscure facts gleaned through
deduction and intuition. Debugging aids make it easier to deduce and
intuit by providing relevant information easily or even automatically,
preferably without modifying your code.
The simplest debugging aid (for any language)
is the output statement, print in
the shell's case. Indeed, old-time programmers debugged their
Fortran code by inserting WRITE cards into their decks.
You can debug by putting
lots of print statements in your code
(and removing them later), but you will
have to spend lots of time narrowing down not only what
exact information you want but also where you need to see it.
You will also probably have to wade through lots and lots of
output to find the information that you really want.
9.1.1. Set Options
Luckily, the shell has a few basic features that give you
debugging functionality beyond that of print.
The most basic of these are options to the set -o command
(as covered in Chapter 3). These options can also be
used on the command line when running a script, as
Table 9-1 shows.
The verbose option simply echoes (to standard error)
whatever input the shell gets. It
is useful for finding the exact point at
which a script is bombing. For example, assume your script looks
like this:
fred
bob
dave
pete
ed
ralph
Table 9-1. Debugging options
set -o option |
Command-line option |
Action |
noexec |
-n |
Don't run commands; check for syntax errors only
|
verbose |
-v |
Echo commands before running them
|
xtrace |
-x |
Echo commands after command-line processing
|
None of these commands are standard Unix programs, and they all
do their work silently. Say the script
crashes with a cryptic message like "segmentation violation."
This tells you nothing about which command caused the error.
If you type ksh -v scriptname,
you might see this:
fred
bob
dave
segmentation violation
pete
ed
ralph
Now you know that dave is the probable culprit -- though it is also
possible that dave bombed because of something it expected
fred or bob to do (e.g., create an input file) that
they did incorrectly.
The xtrace option is more powerful: it echoes each command
and its arguments, after the command
has been through parameter substitution, command substitution,
and the other steps of command-line processing (as listed in
Chapter 7).
If necessary, the output is quoted in such as a way as to allow it to be
reused later as input to the shell.
Here is an example:
$ set -o xtrace
$ fred=bob
+ fred=bob
$ print "$fred"
+ print bob
bob
$ ls -l $(whence emacs)
+ whence emacs
+ ls -l /usr/bin/emacs
-rwxr-xr-x 2 root root 3471896 Mar 16 20:17 /usr/bin/emacs
$
As you can see, xtrace starts each line it prints with +.
This is actually customizable: it's the value of the built-in shell variable
PS4.[127]
If you set PS4
to "xtrace-> "
(e.g., in your .profile or environment file),
you'll get
xtrace listings that look like this:
$ ls -l $(whence emacs)
xtrace-> whence emacs
xtrace-> ls -l /usr/bin/emacs
-rwxr-xr-x 2 root root 3471896 Mar 16 20:17 /usr/bin/emacs
$
An even better way of customizing PS4 is to use a
built-in variable we haven't seen yet: LINENO, which
holds the number of the currently running line in a shell script.
Put this line in your .profile or environment file:
PS4='line $LINENO: '
We use the same technique as we did with PS1 in
Chapter 3: using single quotes to postpone
the evaluation of the string until each time the shell prints the prompt.
This prints messages of the form
line N: in your
trace output.
You could even include the name of the shell
script you're debugging in this prompt by using
the positional parameter $0:
PS4='$0 line $LINENO: '
As another example, say you are trying to track down a bug
in a script called fred that contains this code:
dbfmq=$1.fmq
...
fndrs=$(cut -f3 -d' ' $dfbmq)
You type fred bob to run it in the normal way, and it hangs.
Then you type ksh -x fred bob, and you see this:
+ dbfmq=bob.fmq
...
+ + cut -f3 -d
It hangs again at this point. You notice that cut doesn't
have a filename argument, which means that there must be something
wrong with the variable dbfmq. But it has executed the assignment
statement dbfmq=bob.fmq properly... ah-hah!
You made a typo in the variable name inside the command substitution
construct.[128]
You fix it, and the script works properly.
When set at the global level, the xtrace option applies
to the main script and to any POSIX-style functions (those created with the
name () syntax).
If the code you are trying to debug calls function-style
functions that are defined
elsewhere (e.g., in your .profile or environment file), you can
trace through these in the same way with an option to the typeset
command.
Just enter the command typeset -ft functname,
and the named function will be traced whenever it runs. Type
typeset +ft functname to turn tracing off.
You can also put set -o xtrace into the function body itself,
which is good when the function is within the script being debugged.
The last option is noexec, which reads in the shell
script and checks for syntax errors but doesn't execute anything. It's
worth using if your script is syntactically complex (lots of loops,
code blocks, string operators, etc.) and the bug has side effects (like
creating a large file or hanging up the system).
You can turn on these options with set -o in your shell scripts,
and, as explained in Chapter 3, turn them off
with set +o option.
For example, if you're debugging a
script with a nasty side effect, and you have localized
it to a certain chunk of code, you can precede that chunk with
set -o xtrace (and, perhaps, close it with
set +o xtrace) to watch it in more detail.
NOTE:
The noexec option is a "one-way" option.
Once turned on, you can't turn it off again! That's because the shell only
prints commands and doesn't execute them. This includes the
set +o noexec command you'd want to use
to turn the option off.
Fortunately, this only applies to shell scripts; the shell ignores
this option when it's interactive.
9.1.2. Fake Signals
A more sophisticated set of debugging aids is the shell's
"fake debugging signals," which can be used in trap statements to
get the shell to act under certain conditions. Recall from
the previous chapter that trap allows you to install some
code that runs when a particular signal is sent to your script.
Fake signals act like real ones, but they are generated by
the shell (as opposed to real signals, which the underlying operating
system generates). They represent runtime events that are likely to
be interesting to debuggers -- both human ones and software tools -- and
can be treated just like real signals within shell scripts.
The four fake signals and their meanings are listed in
Table 9-2.
Table 9-2. Fake signals
Fake signal |
When sent |
EXIT |
The shell exits from a function or script
|
ERR |
A command returns a non-zero exit status
|
DEBUG |
Before every statement (after in ksh88)
|
KEYBD |
When reading characters in the editing modes
(not for debugging)
|
The KEYBD signal is not used for debugging.
It is an advanced feature, for which
we delay discussion until
Chapter 10.
9.1.2.1. EXIT
The EXIT trap, when set, runs its code when the function or
script within which it was set exits.
Here's a simple example:
function func {
trap 'print "exiting from the function"' EXIT
print 'start of the function'
}
trap 'print "exiting from the script"' EXIT
print 'start of the script'
func
If you run this script, you see this output:
start of the script
start of the function
exiting from the function
exiting from the script
In other words, the script starts by
setting the trap for its own exit.
Then it prints a message and finally
calls the function.
The function does the same -- sets a trap for its exit and prints a message.
(Remember that function-style functions can have their own local
traps that supersede any traps set by the surrounding script, while
POSIX functions share traps with the main script.)
The function then exits, which causes the shell to send
it the fake signal EXIT, which in turn runs the
code print "exiting from the function".
Then the script exits, and its own
EXIT trap code is run.
Note also that traps "stack;" the EXIT fake signal is sent to each running
function in turn as each more recently called function exits.
An EXIT trap occurs no matter how the script or function exits, whether
normally (by finishing the last statement),
by an explicit exit or return statement,
or by receiving a "real"
signal such as INT or TERM. Consider the following inane number-guessing
program:
trap 'print "Thank you for playing!"' EXIT
magicnum=$(($RANDOM%10+1))
print 'Guess a number between 1 and 10:'
while read guess'?number> '; do
sleep 10
if (( $guess == $magicnum )); then
print 'Right!'
exit
fi
print 'Wrong!'
done
This program picks a number between 1 and 10 by getting a random
number (via the built-in variable RANDOM,
see Appendix B), extracting the last
digit (the remainder when divided by 10), and adding 1. Then
it prompts you for a guess, and
after 10 seconds, it tells you if you guessed right.
If you did, the program exits with the message, "Thank
you for playing!", i.e., it runs the EXIT trap code.
If you were wrong, it prompts you
again and repeats the process until you get it right.
If you get bored with this little game
and hit CTRL-C while waiting for it to tell you
whether you were right, you also see the message.
9.1.2.2. ERR
The fake signal ERR enables you to run code whenever a command
in the surrounding script or function exits with non-zero status.
Trap code for ERR can take advantage of the built-in
variable ?, which holds the exit status of the previous command.
It survives the trap and is accessible at the beginning of the
trap-handling code.
A simple but effective use of this is to put the following code
into a script you want to debug:
function errtrap {
typeset es=$?
print "ERROR: Command exited with status $es."
}
trap errtrap ERR
The first line saves the nonzero exit status in the local variable es.
For example, if the shell can't find a command, it returns status 1.
If you put the code in a script with a line of gibberish
(like "lskdjfafd"), the shell responds with:
scriptname: line N: lskdjfafd: not found
ERROR: command exited with status 1.
N is the number of the line in the script that contains
the bad command. In this case, the shell prints the line number
as part of its own error-reporting mechanism, since the error
was a command that the shell could not find. But if the nonzero
exit status comes from another program, the shell doesn't report
the line number. For example:
function errtrap {
typeset es=$?
print "ERROR: Command exited with status $es."
}
trap errtrap ERR
function bad {
return 17
}
bad
This only prints ERROR: Command exited with status 17.
It would obviously be an improvement to include the line number
in this error message.
The built-in variable LINENO exists,
but if you use it inside a function,
it evaluates to the line number in the function, not in the overall
file. In other words, if you used $LINENO in
the print statement
in the errtrap routine, it would always evaluate to 2.
To get around this problem, we simply pass $LINENO as an
argument to the trap handler, surrounding it in single quotes
so that it doesn't get evaluated until the fake signal actually
comes in:
function errtrap {
typeset es=$?
print "ERROR line $1: Command exited with status $es."
}
trap 'errtrap $LINENO' ERR
...
If you use this with the above example, the result is the message,
ERROR line 12: Command exited with status 17. This is
much more useful. We'll see a variation on this technique shortly.
This simple code is actually not a bad all-purpose debugging
mechanism. It takes into account that a nonzero exit status
does not necessarily indicate an undesirable condition or event:
remember that every control construct with a conditional
(if, while, etc.) uses a nonzero exit status to
mean "false." Accordingly, the shell doesn't generate ERR traps
when statements or expressions in the "condition" parts of control
structures produce nonzero exit statuses.
But a disadvantage is that exit statuses are not as uniform
(or even as meaningful) as they should be, as we explained in
Chapter 5. A particular exit status need not
say anything about the nature of the error or even
that there was an error.
9.1.2.4. Signal delivery order
It is possible for multiple signals to arrive simultaneously (or close to it).
In that case, the shell runs the trap commands in the following order:
DEBUG
ERR
Real Unix signals, in order of signal number
EXIT
9.1.3. Discipline Functions
In Chapter 4, we introduced
the Korn shell's compound variable notation, such as ${person.name}.
Using this notation,
ksh93 provides special functions, called
discipline functions, that give you control over
variables when they are referenced, assigned to, and unset.
Simple versions of such functions might look like this:
dave=dave Create the variable
function dave.set { Called when dave is assigned to
print "dave just got assigned '${.sh.value}'"
}
function dave.get { Called when $dave retrieved
print "dave's value referenced, it's '$dave'" # this is safe
.sh.value="dave was here" Change what $dave returns, dave not changed
}
function dave.unset { Called when dave is unset
print "goodbye dave!"
unset dave # actually make dave go away
}
NOTE:
The unset discipline
function must actually use the unset
command to unset the variable -- this does not cause an infinite loop.
Otherwise, the variable won't be unset,
which in turn leads to very surprising behavior.
Here is what happens once all of these functions are in place:
$ print $dave
dave's value referenced, it's 'dave' From dave.get
dave was here From print
$ dave='who is this dave guy, anyway?'
dave just got assigned 'who is this dave guy, anyway?' From dave.set
$ unset dave
goodbye dave! From dave.unset
$ print $dave
$
Discipline functions may only be applied to global variables.
They may not be used with local variables -- those you create
with typeset inside a function-style
function.
Table 9-3 summarizes the built-in discipline
functions.
Table 9-3. Predefined discipline functions
Name |
Purpose |
variable.get |
Called when a variable's value is retrieved.
Assigning to .sh.value changes the value returned
but not the variable itself.
|
variable.set |
Called when a variable is assigned to. ${.sh.value} is
the new value being assigned. Assigning to .sh.value
changes the value being assigned.
|
variable.unset |
Called when a variable is unset.
This function must use unset on
the variable to actually unset it.
|
As we've just seen, within the discipline functions, there are two
special variables that the shell sets which give you information,
as well as one variable that you can set to change how the shell behaves.
Table 9-4 describes these variables and what
they do.
Table 9-4. Special variables for use in discipline functions
Variable |
Purpose |
.sh.name |
The name of the variable for which the discipline function is being run.
|
.sh.subscript |
The current subscript for an array variable.
(The discipline functions apply to the entire
array, not each subscripted element.)
|
.sh.value |
The new value being assigned
in a set discipline function.
If assigned to in a get discipline
function, changes the value returned.
|
At first glance, it's not clear what the value of discipline functions
is. But they're perfect for implementing a very useful debugger
feature, called watchpoints.
We're now ready to get down to writing our shell script debugger.
![Previous](../gifs/txtpreva.gif) | ![Home](../gifs/txthome.gif) | ![Next](../gifs/txtnexta.gif) | 8.6. Shell Subprocesses and Subshells | ![Book Index](../gifs/index.gif) | 9.2. A Korn Shell Debugger |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|