36.15. Overview: Open Files and File Descriptors
This introduction is general and
simplified. If you're a technical person who needs a
complete and exact description, read a book on Unix programming.
Unix shells let you redirect the
input and output of programs with operators such as
> and |. How does that work?
How can you use it better? Here's an overview.
When the Unix kernel
starts any process (Section 24.3) -- for example, grep,
ls, or a shell -- it sets up several places for
that process to read from and write to, as shown in Figure 36-1.
Figure 36-1. Open standard I/O files with no command-line redirection
These places are called open files. The kernel
gives each file a number called a file
descriptor. But people usually use names for these places
instead of the numbers:
input or stdin (File
Descriptor (F.D.) number 0) is the place where the process can read
text. This might be text from other programs (through a pipe, on the
command line) or from your keyboard.
output or stdout (F.D. 1)
is a place for the process to write its results.
error or stderr (F.D. 2)
is where the process can send error messages.
By default, as Figure 36-1 shows, the file
that's opened for stdin,
stdout, and stderr is
/dev/tty -- a name for your terminal. This
makes life easier for users -- and programmers, too. The user
doesn't have to tell a program where to read or
write because the default is your terminal. A programmer
doesn't have to open files to read or write from (in
many cases); the programs can just read from
stdin, write to stdout, and
send errors to stderr.
It gets better. When the shell starts a
process (when you type a command at a prompt), you can tell the shell
what file to "connect to" any of
those file descriptors. For example, Figure 36-2
shows what happens when you run grep and make the
shell redirect grep's standard
output away from the terminal to a file named
Figure 36-2. Standard output redirected to a file
Programs can read
and write files besides the ones on stdin,
stdout, and stderr. For
instance, in Figure 36-2, grep
opened the file somefile itself -- it
didn't use any of the standard file descriptors for
somefile. A Unix convention is that if you
don't name any files on the command line, a program
will read from its standard input. Programs that work that way are
shells can do basic redirection with stdin,
stdout, and stderr. But as
you'll see in Section 36.16,
the Bourne shell also handles file descriptors 3 through 9 (and
bash and the other newer shells can handle
arbitrary numbers of file descriptiors, up to whatever
ulimit -n happens to be set).
That's useful sometimes:
Maybe you have a few data files that you want to keep reading from or
writing to. Instead of giving their names, you can use the file
Once you open a file, the kernel remembers what place in the file you
last read from or wrote to. Each time you use that file descriptor
number while the file is open, you'll be at the same
place in the file. That's especially nice when you
want to read from or write to the same file with more than one
program. For example, the
line command on some Unix systems reads one
line from a file -- you can call line over and
over, whenever you want to read the next line from a file. Once the
file has been opened, you can remove its link (name) from the
directory; the process can access the file through its descriptor
without using the name.
When Unix starts a new subprocess (Section 24.3), the
open file descriptors are given to that process. A subprocess can
read or write from file descriptors opened by its parent process. A
redirected-I/O loop, as discussed in Section 43.6, takes advantage of this.
Copyright © 2003 O'Reilly & Associates. All rights reserved.