Chapter 6. Command-Line Options and Typed Variables
You should have a healthy grasp of shell programming techniques
now that you have gone through the previous chapters. What you
have learned up to this point
enables you to write many nontrivial, useful shell scripts and functions.
Still, you may have noticed some remaining gaps in the knowledge
you need to write shell code that behaves like the Unix commands you
are used to. In particular, if you are an experienced Unix user,
it might have occurred to you
that none of the example scripts shown so far have the
ability to handle options (preceded by a dash (-))
on the command line.
And if you
program in a conventional language like C or Pascal, you will have
noticed that the only type of data that we have seen in shell variables
is character strings; we haven't seen how to do arithmetic, for example.
These capabilities are certainly crucial to the shell's ability to
function as a useful Unix programming language. In this chapter, we
show how the Korn shell supports these and related features.
We have already seen many examples of the positional parameters
(variables called 1, 2, 3, etc.)
that the shell uses to store the command-line
arguments to a shell script or function when it runs. We have also
seen related variables like * and @
(for the string(s) of all arguments)
and # (for the number of arguments).
Indeed, these variables hold all the information on the user's
command line. But consider what happens when options are involved.
Typical Unix commands have the form command
[-options] args,
meaning that there can be zero or more options. If a shell script
processes the command fred bob pete, then
$1 is `'bob'' and $2 is `'pete''.
But if the command is fred -o bob pete, then
$1 is -o,
$2 is `'bob'', and $3 is
`'pete''.
You might think you could write code like this to handle it:
if [[ $1 == -o ]]; then
code that processes the -o option
1=$2
2=$3
fi
normal processing of $1 and $2...
But this code has several problems. First, assignments like
1=$2 are illegal because positional parameters are read-only.
Even if they were legal, another problem is that
this kind of code imposes limitations on how many arguments
the script can handle -- which is very unwise. Furthermore,
if this command had several possible options, the
code to handle all of them would get very messy very quickly.
6.1.1. shift
Luckily, the shell provides a way around this problem.
The command shift performs the function of:
1=$2
2=$3
...
for every argument, regardless of how many there are. If you supply
a numeric argument[80]
to shift, it shifts the arguments that
many times over; for example, shift 3 has this effect:
1=$4
2=$5
...
This leads immediately to some code that handles a single option
(call it -o) and arbitrarily many arguments:
if [[ $1 == -o ]]; then
process the -o option
shift
fi
normal processing of arguments ...
After the if construct, $1, $2,
etc., are set to the correct arguments,
and $# is automatically adjusted, as well.
We can use shift together with the programming features
we have seen so far to implement simple option schemes. However,
we will need additional help when things get more complex.
The getopts built-in command, which we introduce
later, provides this help.
shift by itself gives us enough power to
implement the -N
option to the highest script we saw in Task 4-1.
Recall that this script takes an input file that lists artists
and the number of albums you have by them. It sorts the list
and prints out the N highest numbers, in descending order.
The code that does the actual data processing is:
filename=$1
howmany=${2:-10}
sort -nr $filename | head -$howmany
Our original syntax for calling this script was
highest filename
[N],
where N defaults to
10 if omitted. Let's change this to a more conventional Unix syntax,
in which options are given before arguments:
highest [-N]
filename. Here is how we would write
the script with this syntax:
if [[ $1 == -+([0-9]) ]]; then
howmany=$1
shift
elif [[ $1 == -* ]]; then
print 'usage: highest [-N] filename'
exit 1
else
howmany="-10"
fi
filename=$1
sort -nr $filename | head $howmany
In this code, the option is considered to be supplied if $1
matches the pattern -+([0-9]). This uses one of the Korn shell's
regular expression operators, which we saw in Chapter 4.
Notice that we didn't surround the pattern with quotes
(even double quotes); if we did, the shell would interpret it literally,
not as a pattern. This pattern means
"A dash followed by one or more digits." If $1 matches,
then we assign it to the variable howmany.
If $1 doesn't match, we test to see if it's an option at all,
i.e., if it matches the pattern -*. If it does, then it's invalid;
we print an error message and exit with error status. If we reach the final
(else) case, we
provide the default value for howmany and
assume that $1 is a filename and
treat it as such in the ensuing code. The rest of the script
processes the data as before.
We can extend what we have learned so far
to a general technique for handling multiple
options. For the sake of concreteness, assume that our script
is called bob and we want to handle
the options -a, -b, and -c:
while [[ $1 == -* ]]; do
case $1 in
-a ) process option -a ;;
-b ) process option -b ;;
-c ) process option -c ;;
* ) print 'usage: bob [-a] [-b] [-c] args ...'
exit 1 ;;
esac
shift
done
normal processing of arguments ...
This code checks $1 repeatedly as long as it starts with a dash
(-).
Then the case construct
runs the appropriate code depending on which option $1 is.
If the option is invalid (i.e., if it starts with a dash but
isn't -a, -b, or
-c), the script prints a usage message
and returns with an error exit status. After each option is
processed, the arguments are shifted over. The result is that
the positional parameters are set to the actual arguments when
the while loop finishes.
Notice that by generalizing this code, you can
handle options of arbitrary
length, not just one letter (e.g., -fred instead of -a).
6.1.3. getopts
So far, we have a complete, though still constrained, way
of handling command-line options. The above code does not allow
a user to combine arguments with a single dash, e.g.,
-abc instead of -a -b -c. It
also doesn't allow the user to
specify arguments to options
without a space in between, e.g.,
-barg in addition to -b arg.[81]
The shell provides a built-in way
to deal with multiple complex options without these constraints.
The built-in command getopts[82]
can be used as the condition of the while in an option-processing
loop. Given a specification of which options are valid
and which require their own arguments, it sets up the
body of the loop to process each option in turn.
getopts takes at least two arguments. The first
is a string that can contain letters
and colons. Each letter is a valid option; if a letter is followed
by a colon, the option requires an argument.
If the letter is followed by a #, the option
requires a numeric argument.
The : or # may be followed
by [description],
i.e., a descriptive string enclosed in square brackets that is used
when generating usage error messages. If you append a space with more
descriptive text to the
list of option characters, that text is also printed in error messages.
getopts picks
options off the command line and assigns each one (without the
leading dash) to a variable whose
name is getopts's second argument.
As long as there are options
left to process, getopts returns exit status 0; when the
options are exhausted, it returns exit status 1, causing the while
loop to exit.
By default, getopts loops through "$@",
the quoted list of command line arguments.
However, you may supply additional arguments to getopts,
in which case it uses those arguments, instead.
getopts does a few other things that make option processing
easier; we'll encounter them as we examine
how to use getopts in the preceding example:
while getopts ":ab:c" opt; do
case $opt in
a ) process option -a ;;
b ) process option -b
$OPTARG is the option's argument ;;
c ) process option -c ;;
\? ) print 'usage: bob [-a] [-b barg] [-c] args ...'
exit 1 ;;
esac
done
shift $(($OPTIND - 1))
normal processing of arguments ...
The call to getopts in the while
condition sets up the loop to accept the options -a,
-b, and -c, and specifies
that -b takes an argument. (We will explain the
":" that starts the option string in a moment.) Each
time the loop body is executed, it has the latest option
available, without a dash (-), in the variable
opt.
If the user types an
invalid option, getopts normally prints an
error message (of the form
cmd: -o: unknown option)
and sets opt to ?.
getopts finishes processing all its options,
and if an error was encountered, the shell exits.
However -- now here's an obscure kludge -- if you begin the
option letter string with a colon, getopts won't print the message,
and shell will not exit.
This allows you to handle error messages on your own.
You may either
supply the leading colon and provide your own error
message in a case that handles ? and exits manually, as above,
or you may provide descriptive text within the call to getopts,
and let the shell handle printing the error message.
In the latter case, the shell will also automatically exit upon
encountering an invalid option.
We have modified the code in the case construct to
reflect what getopts does.
But notice that there are no more shift statements inside the
while loop: getopts does not rely on
shifts to
keep track of where it is. It is unnecessary to shift arguments
over until getopts is finished,
i.e., until the while
loop exits.
If an option has an argument, getopts stores it in the variable
OPTARG, which can be used in the code that processes the
option.
The one shift statement left is after the while loop.
getopts stores in the variable OPTIND the number of
the next argument to be processed; in this case, that's the number
of the first (non-option) command-line argument. For example,
if the command line were bob -ab pete, then $OPTIND
would be "2". If it were bob -a -b pete,
then $OPTIND would be "3".
OPTIND is reinitialized to 1 whenever you run a function,
which allows you to use getopts within a function body.
The expression $(($OPTIND - 1)) is an
arithmetic expression (as we'll see later in this chapter) equal
to $OPTIND minus 1. This value is used as the argument to
shift. The result is that the correct number of arguments
is shifted out of the way, leaving the "real" arguments
as $1, $2, etc.
Before we continue, now is a good time to summarize everything
that getopts does (including some points
not mentioned yet):
If given the -a option and an argument,
getopts uses that argument as the program name in any error messages,
instead of the default, which is the name of the script.
This is most useful if you are using getopts within
a function, where $0 is the name of the function.
In that case, it's less confusing if the error message uses the script name
instead of the function name.
Its first (non-option) argument is a string containing all valid option letters.
If an option requires an argument, a colon follows its letter in
the string. An initial colon causes getopts not to print an
error message when the user gives an invalid option.
Its second argument is the name of a variable that holds
each option letter (without any leading dash) as it is processed.
Upon encountering an error, this variable will contain a literal ?
character.
Following an option letter with a # instead
of a colon indicates that the option takes a numeric argument.
When an option takes an argument (the option letter is followed
by either a color or a # symbol), appending a question
mark indicates that the option's argument is optional (i.e., not
required).
If additional arguments are given on the getopts
command line after the option string and variable name, they are used instead
of "$@".
If an option takes an argument, the argument is stored in the variable
OPTARG.
The variable OPTIND contains a number equal to the next
command-line argument to be processed. After getopts is done,
it equals the number of the first "real" argument.
If the first character in the option string is +
(or the second character after a leading colon), then options may start
with + as well. In this case, the option variable
will have a value that starts with +.
getopts can do much, much more than described here.
See Appendix B, which provides the full story.
The advantages of getopts are that it minimizes extra code
necessary to process options and fully supports the standard command option
syntax as specified by POSIX.
As a more concrete example, let's return to
our C compiler front-end (Task 4-2). So far,
we have given our script the ability to process C source files
(ending in .c), assembly code files (.s), and object code
files (.o). Here is the latest version of the script:
objfiles=""
for filename in "$@"; do
case $filename in
*.c )
objname=${filename%.c}.o
ccom "$filename" "$objname" ;;
*.s )
objname=${filename%.s}.o
as "$filename" "$objname" ;;
*.o )
objname=$filename ;;
* )
print "error: $filename is not a source or object file."
exit 1 ;;
esac
objfiles+=" $objname"
done
ld $objfiles
Now we can give the script the ability to handle options. To know what
options we'll need, we have to discuss further what compilers do.
6.1.3.1. More about C compilers
C compilers on typical modern Unix systems tend to have a bewildering
array of options. To make life simple, we'll limit ourselves to the
most widely-used ones.
Here's what we'll implement. All compilers provide the ability
to eliminate the final linking step, i.e., the call to the linker
ld. This is useful for compiling C code into
object code files that will be linked later, and for taking advantage
of the compiler's error checking separately before trying to link. The
-c option (compile only) suppresses the link step,
producing only the compiled object code files.
C compilers are also capable of including lots of extra information in
an object code file that can be used by a debugger (though it is ignored by
the linker and the running program). If you don't know what a
debugger is, see Chapter 9.
The debugger needs lots of information
about the original C code to be able to do its job; the
option -g directs the compiler to include this information in
its object-code output.
If you aren't already familiar with Unix C compilers, you may have
thought it strange when you saw in the last chapter that the linker
puts its output (the executable program) in a file called a.out.
This convention is a historical relic that no one ever bothered to
change.
Although it's certainly possible to change the executable's
name with the mv command, the C compiler provides the option
-o filename,
which uses filename instead of a.out.
Another option we will support here has to do with
libraries. A library is a collection of object
code, some of which is to be included in the executable at
link time. (This is in contrast to a precompiled object code file,
all of which is linked in.) Each library includes
a large amount of object code that supports a certain type of interface
or activity; typical Unix systems have libraries for things like
networking, math functions, and graphics.
Libraries are extremely useful as building blocks that help programmers
write complex programs without having to "reinvent the wheel" every time.
The C compiler option -l name
tells the linker to include whatever
code is necessary from the library name[83]
in the executable it builds.
One particular library called c
(the file libc.a) is always included. This is known
as the C runtime library; it contains code for
C's standard input and output capability, among other things.
(While Unix compilers normally take library specifications after
the list of object files, our front-end treats them just like any other
option, meaning that they must be listed before the object files.)
Finally, it is possible for a good C compiler to do certain things
that make its output object code smaller and more efficient. Collectively,
these things are called optimization. You can think of an
optimizer as an extra step in the compilation process
that looks back at the
object-code output and changes it for the better. The option -O
invokes the optimizer.
Table 6-1
summarizes the options we will build into our C compiler
front-end.
Table 6-1. Popular C compiler options
Option |
Meaning |
-c |
Produce object code only; do not invoke the linker |
-g |
Include debugging information in object code files |
-l lib |
Include the library lib when linking |
-o exefile |
Produce the executable file exefile instead
of the default a.out
|
-O |
Invoke the optimizer |
You should also bear in mind this information about the options:
The options -o and -l lib
are merely passed on to the
linker (ld), which processes them on its own.
The -l lib option can be used multiple times to link in
multiple libraries.
On most systems, ld requires that library options
come after object files on the command line.
(This also violates the conventions we've been working so hard
to adhere to.)
In addition, the order of libraries on the command line matters.
If a routine in libA.a references another
routine from libB.a,
then libA.a must appear first on the command
line (-lA -lB).
This implies that
the C library (libc.a) has to be loaded last,
since routines in other libraries almost always depend upon the
standard routines in the C library.
The -g option is passed to the ccom command
(the program that does the actual C compilation).
We will assume that the optimizer is a separate program called
optimize that accepts an object file as argument and optimizes
it "in place," i.e., without producing a separate output file.
For our front-end, we've chosen to let the shell handle printing
the usage message.
Here is the code for the script occ that includes option processing:
# initialize option-related variables
do_link=true
debug=""
link_libs=""
clib="-lc"
exefile=""
opt=false
# process command-line options
while getopts "cgl:[lib]o:[outfile]O files ..." option; do
case $option in
c ) do_link=false ;;
g ) debug="-g" ;;
l ) link_libs+=" -l $OPTARG" ;;
o ) exefile="-o $OPTARG" ;;
O ) opt=true ;;
esac
done
shift $(($OPTIND - 1))
# process the input files
objfiles=""
for filename in "$@"; do
case $filename in
*.c )
objname=${filename%.c}.o
ccom $debug "$filename" "$objname"
if [[ $opt == true ]]; then
optimize "$objname"
fi ;;
*.s )
objname=${filename%.s}.o
as "$filename" "$objname" ;;
if [[ $opt == true ]]; then
optimize "$objname"
fi ;;
*.o )
objname=$filename ;;
* )
print "error: $filename is not a source or object file."
exit 1 ;;
esac
objfiles+=" $objname"
done
if [[ $do_link == true ]]; then
ld $exefile $objfiles $link_libs $clib
fi
Let's examine the option-processing part of this code.
The first several lines initialize variables that we use later
to store the status of each of the options. We use "true" and
"false" for truth values for readability; they are just strings
and otherwise have no special meaning. The initializations
reflect these assumptions:
The while, getopts, and case
constructs process the options in the
same way as the previous example. Here is what the code that
handles each option does:
If the -c option is given,
the do_link flag is set to "false,"
which causes the if
condition at the end of the script to be false, meaning that the
linker will not run.
If -g is given,
the debug variable is set to "-g".
This is passed on the command line to the compiler.
Each -l lib that is given is appended to the variable
link_libs,
so that when the while loop exits, $link_libs
is the entire string of -l options. This string is passed
to the linker.
If -o file is given,
the exefile variable is
set to "-o file". This string is passed to the linker.
If -O is specified, the opt flag is set to
"true." This specification
causes the conditional if [[ $opt == true ]] to be true,
which means that the optimizer will run.
The remainder of the code is a modification of the for loop
we have already seen; the modifications are direct results of the
above option processing and should be self-explanatory.
 |  |  | 5.5. while and until |  | 6.2. Numeric Variables and Arithmetic |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|