Chapter 7. Input/Output and Command-Line Processing
The past few chapters have gone into detail about various shell
programming techniques, mostly focused on the flow of data and
control through shell programs. In this chapter,
we'll switch the focus to two related topics.
The first is the shell's mechanisms for doing file-oriented input
and output. We present information that expands on what you
already know about the shell's basic I/O redirectors.
Second, we zoom in and talk about I/O at the line and word level.
This is a fundamentally different topic, since it involves moving
information between the domains of files/terminals and shell variables.
print and
command substitution are two ways of doing this that we've seen so far.
Our discussion of line and word I/O then leads into a more detailed
explanation of how the shell processes command lines.
This information is necessary so that you can understand exactly
how the shell deals with quotation, and so that you can appreciate
the power of an advanced command called eval, which we cover
at the end of the chapter.
In Chapter 1 you learned about the shell's basic I/O redirectors,
<, >, and |.
Although these are enough to get you
through 95% of your Unix life, you should know that the Korn shell
supports a total of 20 I/O redirectors.
Table 7-1
lists them, including the three we've already seen.
Although some of the rest are useful, others are mainly for
systems programmers.
We will wait until the next chapter to discuss
the last three, which,
along with >| and <<<, are not present in most Bourne shell
versions.
Table 7-1. I/O redirectors
Redirector |
Function |
> file |
Direct standard output to file |
< file |
Take standard input from file |
cmd1 | cmd2 |
Pipe; take standard output of cmd1 as standard input to
cmd2
|
>> file |
Direct standard output to file;
append to file if it already exists
|
>| file |
Force standard output to file even if noclobber is set
|
<> file |
Open file for both reading and writing on standard input[90]
|
<< label |
Here-document; see text |
<<- label |
Here-document variant; see text |
<<< label |
Here-string; see text |
n> file |
Direct output file descriptor n to file |
n< file |
Set file as input file descriptor n |
<&n |
Duplicate standard input from file descriptor n |
>&n |
Duplicate standard output to file descriptor n |
<&n- |
Move file descriptor n to standard input |
>&n- |
Move file descriptor n to standard output |
<&- |
Close the standard input |
>&- |
Close the standard output |
|& |
Background process with I/O from parent shell |
n<&p |
Move input from coprocess to file descriptor n
|
n>&p |
Move output to coprocess to file descriptor n
|
[90]
Normally, files opened with < are opened read-only.
Notice that some of the redirectors in
Table 7-1
contain a digit n
and that their descriptions contain the term file descriptor;
we'll cover that in a little while.
(In fact, any redirector that starts with < or
> may be used with a file descriptor; this is omitted
from the table for simplicity.)
The first two new redirectors, >> and >|, are simple
variations on the standard output redirector >. The >>
appends to the output file (instead of overwriting it)
if it already exists; otherwise it acts
exactly like >.
A common use of >> is for adding
a line to an initialization file (such as .profile or
.mailrc) when you don't want to bother with a text editor. For example:
$ cat >> .mailrc
> alias fred frederick@longmachinename.longcompanyname.com
> ^D
$
As we saw in Chapter 1,
cat without an argument uses standard input as its input. This
allows you to type the input and end it with CTRL-D on its own line.
The alias line will be appended to the file .mailrc
if it already exists; if it doesn't, the file is created with
that one line.
Recall from Chapter 3 that you can prevent the shell from
overwriting a file with > file
by typing set -o noclobber.
The >| operator overrides noclobber -- it's
the "Do it anyway, darn it!" redirector.
Unix systems allow you to open files read-only, write-only, and read-write.
The < redirector opens the input file read-only; if a program attempts
to write on standard input, it will receive an error.
Similarly, the > redirector opens the output file write-only; attempting
to read from standard output generates an error.
The <> redirector opens a file for both reading and writing, by
default on standard input. It is up to the invoked program to
notice this and take advantage of the fact, but it is useful in the case where a program
may want to update data in a file "in place."
This operator is most used for writing networking clients; see
Section 7.1.4, later in this chapter
for an example.
7.1.2. Here-Strings
A common idiom in shell programming is to use print
to generate some text to be further processed by one or more commands:
# start with a mild interrogation
print -r "$name, $rank, $serial_num" | interrogate -i mild
This could be rewritten to use a here-document, which is
slightly more efficient, although not necessarily any
easier to read:
# start with a mild interrogation
interrogate -i mild << EOF
$name, $rank, $serial_num
EOF
Starting with ksh93n,[93]
the Korn shell provides a new
form of here-document, using three less-than signs:
program <<< WORD
In this form, the text of WORD
(followed by a trailing newline) becomes
the input to the program.
For example:
# start with a mild interrogation
interrogate -i mild <<< "$name, $rank, $serial_num"
This notation first originated in the Unix version of the rc
shell, where it is called a "here string."
It was later picked up by the Z shell, zsh
(see Appendix A), from which the Korn shell borrowed it.
This notation is simple, easy to use, efficient, and visually distinguishable
from regular here-documents.
7.1.3. File Descriptors
The next few redirectors in
Table 7-1
depend on the notion of a
file descriptor.
This is a low-level Unix I/O concept that
is vital to understand when programming in C or C++. It
appears at the shell level when you want to do anything that
doesn't involve standard input, standard output and standard error.
You can get by with a few basic facts about them; for
the whole story, look at the
open(2),
creat(2),
read(2),
write(2),
dup(2),
dup2(2),
fcntl(2),
and
close(2)
entries in the Unix manual.
(As the manual entries are aimed at the C programmer, their
relationship to the shell concepts won't necessarily be obvious.)
File descriptors are integers starting at 0 that index
an array of file information within a process. When a
process starts, it has three file descriptors open.
These correspond to the three standards: standard input (file descriptor
0), standard output (1), and standard error (2). If a process
opens Unix files for input or output,
they are assigned to the
next available file descriptors, starting with 3.
By far the most common use of file descriptors with the Korn shell
is in saving standard error in a file. For example, if you want
to save the error messages from a long job in a file so that they
don't scroll off the screen, append 2> file
to your command.
If you also want to save standard output, append
> file1
2> file2.
This leads to Task 7-3.
We'll call this function start. The code is very terse:
function start {
"$@" > logfile 2>&1 &
}
This line executes whatever command and parameters follow start.
(The command cannot contain pipes or output redirectors.)
It first sends the command's standard output to logfile.
Then, the redirector 2>&1 says, "Send standard error (file descriptor
2) to the same place as standard output (file descriptor 1)."
2>&1 is actually a combination of two redirectors in
Table 7-1:
n> file and
>&n.
Since standard output is redirected to logfile, standard error
will go there too.
The final & puts the job in the background so that you get
your shell prompt back.
As a small variation on this theme, we can send
both standard output and standard error into a pipe instead
of a file: command 2>&1 | ... does this.
(Why this works is described shortly.)
Here is a function that sends both standard output and
standard error to the logfile (as above) and to the terminal:
function start {
"$@" 2>&1 | tee logfile &
}
The command tee(1) takes its standard input and copies it
to standard output and the file given as argument.
These functions have one shortcoming: you must remain logged in until
the job completes.
Although you can always type jobs (see
Chapter 1) to check on progress, you can't leave your office for the day
unless you want to risk a breach
of security or waste electricity. We'll see how to solve this problem in
Chapter 8.
The other file-descriptor-oriented redirectors
(e.g., <&n)
are usually used for reading input from (or writing output to)
more than one file at the same time. We'll see an example later in this chapter. Otherwise, they're
mainly meant for systems programmers, as are
<&- (force standard input to close) and >&-
(force standard output to close),
<&n-
(move file descriptor n to standard input)
and
>&n-
(move file descriptor n to standard output).
Finally,
we should just note that
0< is the same as <,
and
1> is the same as >.
(In fact, 0 is the default for any operator that begins with <,
and 1 is the default for any operator that begins with >.)
7.1.3.1. Redirector ordering
The shell processes I/O redirections in a specific order.
Once you understand how this works, you can take advantage of it, particularly for
managing the disposition of standard output and standard error.
The first thing the shell does is set up the standard input and output for pipelines as
indicated by the | character.
After that, it processes the changing of individual file descriptors.
As we just saw,
the most common idiom that takes advantage of this is to send both standard output
and standard error down the same pipeline to a pager program, such
as more or less.[94]
$ mycommand -h fred -w wilma 2>&1 | more
In this example, the shell first sets the standard output of mycommand to
be the pipe to more.
It then redirects standard error (file descriptor 2) to be the same as standard
output (file descriptor 1), i.e., the pipe.
When working with just redirectors, they are processed left-to-right, as they occur on the
command line. An example similar to the following has been in the shell man page since the
original Version 7 Bourne shell:
program > file1 2>&1 Standard output and standard error to file1
program 2>&1 > file1 Standard error to terminal and standard output to file1
In the first case, standard output is sent to
file1, and standard error is then sent to
where standard output is, i.e., file1.
In the second case, standard error is sent to where standard output is, which is still the terminal.
The standard output is then redirected to file1, but only the standard output.
If you understand this, you
probably know all you need to know about file descriptors.
7.1.4. Special Filenames
Normally, when you provide a pathname after an I/O redirector such as
< or >, the shell tries to
open an actual file that has the given filename. However, there are two
kinds of pathnames where the shell instead treats the pathnames specially.
The first kind of pathname is /dev/fd/N,
where N is the file descriptor number
of an already open file.
For example:
# assume file descriptor 6 is already open on a file
print 'something meaningful' > /dev/fd/6 # same as 1>&6
This works even on systems that don't have a /dev/fd directory.
This kind of pathname may also be used with the various file attribute test operators
of the [[...]] command.
The second kind of pathname allows access to Internet services via either
the TCP or UDP protocol. The pathnames are:
-
/dev/tcp/host/port
-
Using TCP, connect to remote host host on remote port port.
The host may be given as an IP address in dotted-decimal
notation (1.2.3.4) or as a hostname (www.oreilly.com).
Similarly, the port for the desired service may be a symbolic name (typically as
found in /etc/services) or a numeric port number.[95]
-
/dev/udp/host/port
-
This is the same, but using UDP.
To use these files for two-way I/O, open a new file descriptor using the
exec command (which is described in
Chapter 9), using the "read and write"
operator, <>.
Then use read -u and
print -u to read from and write to the
new file descriptor.
(The read command and the -u
option to read and print are
described later in this chapter.)
The following example, courtesy of David Korn, shows how to do this.
It implements the whois(1) program, which
provides information about the registration of Internet domain names:
host=rs.internic.net
port=43
exec 3<> /dev/tcp/$host/$port
print -u3 -f "%s\r\n" "$@"
cat <&3
Using the exec built-in command
(see Chapter 9), this program uses the
"read-and-write" operator, <>, to open a two-way
connection to the host rs.internic.net
on TCP port 43, which provides the whois service.
(The script could have used port=whois as well.)
It then uses the print command to send the
argument strings to the whois server.
Finally, it reads the returned result using cat.
Here is a sample run:
$ whois.ksh kornshell.com
Whois Server Version 1.3
Domain names in the .com, .net, and .org domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.
Domain Name: KORNSHELL.COM
Registrar: NETWORK SOLUTIONS, INC.
Whois Server: whois.networksolutions.com
Referral URL: http://www.networksolutions.com
Name Server: NS4.PAIR.COM
Name Server: NS0.NS0.COM
Updated Date: 02-dec-2001
>>> Last update of whois database: Sun, 10 Feb 2002 05:19:14 EST <<<
The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and
Registrars.
Network programming is beyond the scope of this book. But for most things, you will probably
want to use TCP connections instead of UDP connections if you do write any networking
programs in ksh.
![Previous](../gifs/txtpreva.gif) | ![Home](../gifs/txthome.gif) | ![Next](../gifs/txtnexta.gif) | 6.5. typeset | ![Book Index](../gifs/index.gif) | 7.2. String I/O |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|