Overview: Open Files and File Descriptors (Unix Power Tools, 3rd Edition)

36.15. Overview: Open Files and File Descriptors

This introduction is general and simplified. If you're a technical person who needs a complete and exact description, read a book on Unix programming.

Unix shells let you redirect the input and output of programs with operators such as > and |. How does that work? How can you use it better? Here's an overview.

When the Unix kernel starts any process (Section 24.3) -- for example, grep, ls, or a shell -- it sets up several places for that process to read from and write to, as shown in Figure 36-1.

Figure 36-1. Open standard I/O files with no command-line redirection

These places are called open files. The kernel gives each file a number called a file descriptor. But people usually use names for these places instead of the numbers:

The standard input or stdin (File Descriptor (F.D.) number 0) is the place where the process can read text. This might be text from other programs (through a pipe, on the command line) or from your keyboard.
The standard output or stdout (F.D. 1) is a place for the process to write its results.
The standard error or stderr (F.D. 2) is where the process can send error messages.

By default, as Figure 36-1 shows, the file that's opened for stdin, stdout, and stderr is /dev/tty -- a name for your terminal. This makes life easier for users -- and programmers, too. The user doesn't have to tell a program where to read or write because the default is your terminal. A programmer doesn't have to open files to read or write from (in many cases); the programs can just read from stdin, write to stdout, and send errors to stderr.

It gets better. When the shell starts a process (when you type a command at a prompt), you can tell the shell what file to "connect to" any of those file descriptors. For example, Figure 36-2 shows what happens when you run grep and make the shell redirect grep's standard output away from the terminal to a file named grepout.

Figure 36-2. Standard output redirected to a file

Programs can read and write files besides the ones on stdin, stdout, and stderr. For instance, in Figure 36-2, grep opened the file somefile itself -- it didn't use any of the standard file descriptors for somefile. A Unix convention is that if you don't name any files on the command line, a program will read from its standard input. Programs that work that way are called filters.

All shells can do basic redirection with stdin, stdout, and stderr. But as you'll see in Section 36.16, the Bourne shell also handles file descriptors 3 through 9 (and bash and the other newer shells can handle arbitrary numbers of file descriptiors, up to whatever ulimit -n happens to be set). That's useful sometimes:

Maybe you have a few data files that you want to keep reading from or writing to. Instead of giving their names, you can use the file descriptor numbers.

Once you open a file, the kernel remembers what place in the file you last read from or wrote to. Each time you use that file descriptor number while the file is open, you'll be at the same place in the file. That's especially nice when you want to read from or write to the same file with more than one program. For example, the line command on some Unix systems reads one line from a file -- you can call line over and over, whenever you want to read the next line from a file. Once the file has been opened, you can remove its link (name) from the directory; the process can access the file through its descriptor without using the name.
When Unix starts a new subprocess (Section 24.3), the open file descriptors are given to that process. A subprocess can read or write from file descriptors opened by its parent process. A redirected-I/O loop, as discussed in Section 43.6, takes advantage of this.

-- JP