[Chapter 1] 1.7 Input and Output

1.7 Input and Output

The software field-really, any scientific field-tends to advance most quickly and impressively on those few occasions when someone (i.e., not a committee) comes up with an idea that is small in concept yet enormous in its implications. The standard input and output scheme of UNIX has to be on the short list of such ideas, along with such classic innovations as the LISP language, the relational data model, and object-oriented programming.

The UNIX I/O scheme is based on two dazzlingly simple ideas. First, UNIX file I/O takes the form of arbitrarily long sequences of characters (bytes). In contrast, file systems of older vintage have more complicated I/O schemes (e.g., "block," "record," "card image," etc.). Second, everything on the system that produces or accepts data is treated as a file; this includes hardware devices like disk drives and terminals. Older systems treated every device differently. Both of these ideas have made systems programmers' lives much more pleasant.

1.7.1 Standard I/O

By convention, each UNIX program has a single way of accepting input called standard input , a single way of producing output called standard output , and a single way of producing error messages called standard error output , usually shortened to standard error . Of course, a program can have other input and output sources as well, as we will see in Chapter 7 .

Standard I/O was the first scheme of its kind that was designed specifically for interactive users at terminals, rather than the older batch style of use that usually involves decks of punch-cards. Since the UNIX shell provides the user interface, it should come as no surprise that standard I/O was designed to fit in very neatly with the shell.

All shells handle standard I/O in basically the same way. Each program that you invoke has all three standard I/O channels set to your terminal or workstation, so that standard input is your keyboard, and standard output and error are your screen or window. For example, the mail utility prints messages to you on the standard output, and when you use it to send messages to other users, it accepts your input on the standard input. This means that you view messages on your screen and type new ones in on your keyboard.

When necessary, you can redirect input and output to come from or go to a file instead. If you want to send the contents of a pre-existing file to someone as mail, you redirect mail 's standard input so that it reads from that file instead of your keyboard.

You can also hook up programs into a pipeline , in which the standard output of one program feeds directly into the standard input of another; for example, you could feed mail output directly to the lp program so that messages are printed instead of shown on the screen.

This makes it possible to use UNIX utilities as building blocks for bigger programs. Many UNIX utility programs are meant to be used in this way: they each perform a specific type of filtering operation on input text. Although this isn't a textbook on UNIX utilities, they are essential to productive shell use. The more popular filtering utilities are listed in Table 1.5 .

Table 1.5: Popular UNIX Data Filtering Utilities
Utility	Purpose
cat	Copy input to output
grep	Search for strings in the input
sort	Sort lines in the input
cut	Extract columns from input
sed	Perform editing operations on input
tr	Translate characters in the input to other characters

You may have used some of these before and noticed that they take names of input files as arguments and produce output on standard output. You may not know, however, that all of them (and most other UNIX utilities) accept input from standard input if you omit the argument. [8]

[8] If a particular UNIX utility doesn't accept standard input when you leave out the filename argument, try using - as the argument.

For example, the most basic utility is cat , which simply copies its input to its output. If you type cat with a filename argument, it will print out the contents of that file on your screen. But if you invoke it with no arguments, it will expect standard input and copy it to standard output. Try it: cat will wait for you to type a line of text; when you type RETURN, cat will parrot the text back at you. To stop the process, hit [CTRL-D] at the beginning of a line (see below for what this character means). You will see ^ D when you type [CTRL-D] . Here's what this should look like:

$ 
cat


Here is a line of text.

Here is a line of text.

This is another line of text.

This is another line of text.

^D

$

1.7.2 I/O Redirection

cat is actually short for "catenate," i.e., link together. It accepts multiple filename arguments and copies them to the standard output. But let's pretend, for the moment, that cat and other utilities don't accept filename arguments and accept only standard input. As we said above, the shell lets you redirect standard input so that it comes from a file. The notation command < filename does this; it sets things up so that command takes standard input from a file instead of from a terminal.

For example, if you have a file called fred that contains some text, then cat < fred will print fred 's contents out onto your terminal. sort < fred will sort the lines in the fred file and print the result on your terminal (remember: we're pretending that utilities don't take filename arguments).

Similarly, command > filename causes the command 's standard output to be redirected to the named file. The classic "canonical" example of this is date > now : the date command prints the current date and time on the standard output; the above command saves it in a file called now .

Input and output redirectors can be combined. For example: the cp command is normally used to copy files; if for some reason it didn't exist or was broken, you could use cat in this way:

$ 
cat  <
 
file1
  
>
  
file2

This would be similar to cp file1 file2 .

1.7.3 Pipelines

It is also possible to redirect the output of a command into the standard input of another command instead of a file. The construct that does this is called the pipe, notated as | . A command line that includes two or more commands connected with pipes is called a pipeline.

Pipes are very often used with the more command, which works just like cat except that it prints its output screen by screen, pausing for the user to type SPACE (next screen), RETURN (next line), or other commands. If you're in a directory with a large number of files and you want to see details about them, ls -l | more will give you a detailed listing a screen at a time.

Pipelines can get very complex (see, for example, the lsd function in Chapter 4 or the pipeline version of the C compiler driver in Chapter 7 ); they can also be combined with other I/O directors. To see a sorted listing of the file fred a screen at a time, type sort < fred | more . To print it instead of viewing it on your terminal, type sort < fred | lp .

Here's a more complicated example. The file /etc/passwd stores information about users' accounts on a UNIX system. Each line in the file contains a user's login name, user ID number, encrypted password, home directory, login shell, and other info. The first field of each line is the login name; fields are separated by colons (: ). A sample line might look like this:

billr:5Ae40BGR/tePk:284:93:Bill Rosenblatt:/home/billr:/bin/ksh

To get a sorted listing of all users on the system, type:

$ 
cut -d: -f1 < /etc/passwd | sort

(Actually, you can omit the < , since cut accepts input filename arguments.) The cut command extracts the first field ( -f1 ), where fields are separated by colons ( -d: ), from the input. The entire pipeline will print a list that looks like this:

al
billr
bob
chris
dave
ed
frank
...

If you want to send the list directly to the printer (instead of your screen), you can extend the pipeline like this:

$ 
cut -d: -f1 < /etc/passwd | sort | lp

Now you should see how I/O directors and pipelines support the UNIX building block philosophy. The notation is extremely terse and powerful. Just as important, the pipe concept eliminates the need for messy temporary files to store output of commands before it is fed into other commands.

For example, to do the same sort of thing as the above command line on other operating systems (assuming that equivalent utilities were available...), you would need three commands. On DEC 's VAX/VMS system, they might look like this:

$ 
cut [etc]passwd /d=":" /f=1 /out=temp1

$ 
sort temp1 /out=temp2

$ 
print temp2

After sufficient practice, you will find yourself routinely typing in powerful command pipelines that do in one line what it would take several commands (and temporary files) in other operating systems to accomplish.