String I/O (Learning the Korn Shell, 2nd Edition)

7.2.3. read

The other side of the shell's string I/O facilities is the read command, which allows you to read values into shell variables. The basic syntax is:

read var1 var2 ...

There are a few options, which we cover in Section 7.2.3.5, later in this chapter. This statement takes a line from the standard input and breaks it down into words delimited by any of the characters in the value of the variable IFS (see Chapter 4; these are usually a space, a TAB, and newline). The words are assigned to variables var1, var2, etc. For example:

$ read fred bob
dave pete
$ print "$fred"
dave

$ print "$bob"
pete

If there are more words than variables, excess words are assigned to the last variable. If you omit the variables altogether, the entire line of input is assigned to the variable REPLY.

You may have identified this as the missing ingredient in the shell programming capabilities we've seen so far. It resembles input statements in conventional languages, like its namesake in Pascal. So why did we wait this long to introduce it?

Actually, read is sort of an escape hatch from traditional shell programming philosophy, which dictates that the most important unit of data to process is a text file, and that Unix utilities such as cut, grep, sort, etc., should be used as building blocks for writing programs.

read, on the other hand, implies line-by-line processing. You could use it to write a shell script that does what a pipeline of utilities would normally do, but such a script would inevitably look like:

while (read a line) do
    process the line
    print the processed line
end

This type of script is usually much slower than a pipeline; furthermore, it has the same form as a program someone might write in C (or some similar language) that does the same thing much, much faster. In other words, if you are going to write it in this line-by-line way, there is no point in writing a shell script. (The authors have gone for years without writing a script with read in it.)

7.2.3.1. Reading lines from files

Nevertheless, shell scripts with read are useful for certain kinds of tasks. One is when you are reading data from a file small enough so that efficiency isn't a concern (say a few hundred lines or less), and it's really necessary to get bits of input into shell variables.

One task that we have already seen fits this description: Task 5-4, the script that a system administrator could use to set a user's TERM environment variable according to which terminal line he or she is using. The code in Chapter 5 used a case statement to select the correct value for TERM.

This code would presumably reside in /etc/profile, the system-wide initialization file that the Korn shell runs before running a user's .profile. If the terminals on the system change over time -- as surely they must -- then the code would have to be changed. It would be better to store the information in a file and change just the file instead.

Assume we put the information in a file whose format is typical of such Unix "system configuration" files: each line contains a device name, a TAB, and a TERM value. If the file, which we'll call /etc/terms, contained the same data as the case statement in Chapter 5, it would look like this:

console s531
tty01   gl35a
tty03   gl35a
tty04   gl35a
tty07   t2000
tty08   s531

We can use read to get the data from this file, but first we need to know how to test for the end-of-file condition. Simple: read's exit status is 1 (i.e., nonzero) when there is nothing to read. This leads to a clean while loop:

TERM=vt99       # assume this as a default
line=$(tty)
while read dev termtype; do
    if [[ $dev == $line ]]; then
        TERM=$termtype
        export TERM
        print "TERM set to $TERM."
        break
    fi
done

The while loop reads each line of the input into the variables dev and termtype. In each pass through the loop, the if looks for a match between $dev and the user's tty ($line, obtained by command substitution from the tty command). If a match is found, TERM is set and exported, a message is printed, and the loop exits; otherwise TERM remains at the default setting of vt99.

We're not quite done, though: this code reads from the standard input, not from /etc/terms! We need to know how to redirect input to multiple commands. There are a few ways of doing this.

7.2.3.2. I/O redirection and multiple commands

One way to solve the problem is with a subshell, as we'll see in the next chapter. This involves creating a separate process to do the reading. However, it is usually more efficient to do it in the same process; the Korn shell gives us three ways of doing this.

The first, which we have seen already, is with a function:

function findterm {
    TERM=vt99       # assume this as a default
    line=$(tty)
    while read dev termtype; do
        if [[ $dev == $line ]]; then
            TERM=$termtype
            export TERM
            print "TERM set to $TERM."
            break
        fi
    done
}

findterm < /etc/terms

A function acts like a script in that it has its own set of standard I/O descriptors, which can be redirected in the line of code that calls the function. In other words, you can think of this code as if findterm were a script and you typed findterm < /etc/terms on the command line. The read statement takes input from /etc/terms a line at a time, and the function runs correctly.

The second way is by putting the I/O redirector at the end of the loop, like this:

TERM=vt99       # assume this as a default
line=$(tty)
while read dev termtype; do
    if [[ $dev == $line ]]; then
        TERM=$termtype
        export TERM
        print "TERM set to $TERM."
        break
    fi
done < /etc/terms

You can use this technique with any flow-control construct, including if...fi, case...esac, for...done, select...done, and until...done. This makes sense because these are all compound statements that the shell treats as single commands for these purposes. This technique works fine -- the read command reads a line at a time -- as long as all of the input is done within the compound statement.

Putting the I/O redirector at the end is particularly important for making loops work correctly. Suppose you place the redirector after the read command, like so:

while read dev termtype < /etc/terms
do
    ...
done

In this case, the shell reopens /etc/terms each time around the loop, reading the first line over and over again. This effectively creates an infinite loop, something you probably don't want.

7.2.3.3. Code blocks

Occasionally, you may want to redirect I/O to or from an arbitrary group of commands without creating a separate process. To do that, you need to use a construct that we haven't seen yet. If you surround some code with { and },[97] the code will behave like a function that has no name. This is another type of compound statement. In accordance with the equivalent concept in the C language, we'll call this a block of code.[98]

[97] For obscure, historical syntactic reasons, the braces are shell keywords. In practice, this means that the closing } must be preceded by either a newline or a semicolon. Caveat emptor!

[98] LISP programmers may prefer to think of this as an anonymous function or lambda-function.

What good is a block? In this case, it means that the code within the curly braces ({ }) will take standard I/O descriptors just as we described for functions. This construct is also appropriate for the current example because the code needs to be called only once, and the entire script is not really large enough to merit breaking down into functions. Here is how we use a block in the example:

{
    TERM=vt99       # assume this as a default
    line=$(tty)
    while read dev termtype; do
        if [[ $dev == $line ]]; then
            TERM=$termtype
            export TERM
            print "TERM set to $TERM."
            break
        fi
    done
} < /etc/terms

To help you understand how this works, think of the curly braces and the code inside them as if they were one command, i.e.:

{ TERM=vt99; line=$(tty); while ... ; } < /etc/terms

Configuration files for system administration tasks like this one are actually fairly common; a prominent example is /etc/hosts, which lists machines that are accessible in a TCP/IP network. We can make /etc/terms more like these standard files by allowing comment lines in the file that start with #, just as in shell scripts. This way /etc/terms can look like this:

#
# System Console is a Shande 531s
console s531
#
# Prof. Subramaniam's line has a Givalt GL35a
tty01   gl35a
...

We can handle comment lines in two ways. First, we could modify the while loop so that it ignores lines beginning with #. We would take advantage of the fact that the equality and inequality operators (== and !=) under [[...]] do pattern matching, not just equality testing:

if [[ $dev != \#* && $dev == $line ]]; then
    ...

The pattern is #*, which matches any string beginning with #. We must precede # with a backslash so that the shell doesn't treat the rest of the line as a comment. Also, remember from Chapter 5 that the && combines the two conditions so that both must be true for the entire condition to be true.

This would certainly work, but the usual way to filter out comment lines is to use a pipeline with grep. We give grep the regular expression ^[^#], which matches anything except lines beginning with #. Then we change the call to the block so that it reads from the output of the pipeline instead of directly from the file.[99]

[99] Unfortunately, using read with input from a pipe is often very inefficient, because of issues in the design of the shell that aren't relevant here.

grep "^[^#]" /etc/terms | {
    TERM=vt99
    ...
}

We can also use read to improve our solution to Task 6-3, in which we emulate the multicolumn output of ls. In the solution in the previous chapter, we assumed for simplicity that filenames are limited to 14 characters, and we used 14 as a fixed column width. We'll improve the solution so that it allows any filename length (as in modern Unix versions) and uses the length of the longest filename (plus 2) as the column width.

In order to display the list of files in multicolumn format, we need to read through the output of ls twice. In the first pass, we find the longest filename and use that to set the number of columns as well as their width; the second pass does the actual output. Here is a block of code for the first pass:

ls "$@" | {
    let width=0
    while read fname; do
        if (( ${#fname} > $width )); then
            let width=${#fname}
        fi
    done
    let "width += 2"
    let numcols="int(${COLUMNS:-80} / $width)"
}

This code looks a bit like an exercise from a first-semester programming class. The while loop goes through the input looking for files with names that are longer than the longest found so far; if a longer one is found, its length is saved as the new longest length.

After the loop finishes, we add 2 to the width to allow for space between columns. Then we divide the width of the terminal by the column width to get the number of columns. As the shell does division in floating-point, the result is passed to the int function to produce an integer final result. Recall from Chapter 3 that the built-in variable COLUMNS often contains the display width; the construct ${COLUMNS:-80} gives a default of 80 if this variable is not set.

The results of the block are the variables width and numcols. These are global variables, so they are accessible by the rest of the code inside our (eventual) script. In particular, we need them in our second pass through the filenames. The code for this resembles the code to our original solution; all we need to do is replace the fixed column width and number of columns with the variables:

set -A filenames $(ls "$@")
typeset -L$width fname
let count=0

while (( $count < ${#filenames[*]} )); do
    fname=${filenames[$count]}
    print "$fname  \c"
    let count++
    if [[ $((count % numcols)) == 0 ]]; then
         print          # output a newline
    fi
done

if (( count % numcols != 0 )); then
    print
fi

The entire script consists of both pieces of code. As yet another "exercise for the reader," consider how you might rearrange the code to only invoke the ls command once. (Hint: use at least one arithmetic for loop.)

7.2.3.4. Reading user input

The other type of task to which read is suited is prompting a user for input. Think about it: we have hardly seen any such scripts so far in this book. In fact, the only ones were the modified solutions to Task 5-4, which involved select.

As you've probably figured out, read can be used to get user input into shell variables. We can use print to prompt the user, like this:

print -n 'terminal? '
read TERM
print "TERM is $TERM"

Here is what this looks like when it runs:

terminal? vt99
TERM is vt99

However, in order that prompts don't get lost down a pipeline, shell convention dictates that prompts should go to standard error, not standard output. (Recall that select prompts to standard error.) We could just use file descriptor 2 with the output redirector we saw earlier in this chapter:

print -n 'terminal? ' >&2
read TERM
print TERM is $TERM

The shell provides a better way of doing the same thing: if you follow the first variable name in a read statement with a question mark (?) and a string, the shell uses that string as a prompt to standard error. In other words:

read TERM?'terminal? '
print "TERM is $TERM"

does the same as the above. The shell's way is better for the following reasons. First, this looks a bit nicer; second, the shell knows not to generate the prompt if the input is redirected to come from a file; and finally, this scheme allows you to use vi- or emacs-mode on your input line.

We'll flesh out this simple example by showing how Task 5-4 would be done if select didn't exist. Compare this with the code in Chapter 6:

set -A termnames gl35a t2000 s531 vt99
print 'Select your terminal type:'
while true;  do
    {
        print '1) gl35a'
        print '2) t2000'
        print '3) s531'
        print '4) vt99'
    } >&2
    read REPLY?'terminal? '

    if (( REPLY >= 1 && REPLY <= 4 )); then
        TERM=${termnames[REPLY-1]}
        print "TERM is $TERM"
        export TERM
        break
    fi
done

The while loop is necessary so that the code repeats if the user makes an invalid choice.

This is roughly twice as many lines of code as the first solution in Chapter 5 -- but exactly as many as the later, more user-friendly version! This shows that select saves you code only if you don't mind using the same strings to display your menu choices as you use inside your script.

However, select has other advantages, including the ability to construct multicolumn menus if there are many choices, and better handling of empty user input.

7.2.3.5. Options to read

read takes a set of options that are similar to those for print. Table 7-7 lists them.

Table 7-7. read options

Option	Function
`-A`	Read words into indexed array, starting at index 0. Unsets all elements of the array first.
`-d` `delimiter`	Read up to character delimiter, instead of the default character, which is a newline.
`-n` `count`	Read at most count bytes.[100]
`-p`	Read from pipe to coroutine; see Chapter 8.
`-r`	Raw; do not use `\` as line continuation character.
`-s`	Save input in command history file; see Chapter 1.
`-t` `nseconds`	Wait up to nseconds seconds for the input to come in. If nseconds elapses, return a failure exit status.
`-u``n`	Read from file descriptor n.

[100] This option was added in ksh93e.

Having to type read word[0] word[1] word[2] ... to read words into an array is painful. It is also error-prone; if the user types more words than you've provided array variables, the remaining words are all assigned to the last array variable. The -A option gets around this, reading each word one at a time into the corresponding entries in the named array.

The -d option lets you read up to some other character than a newline. In practical terms, you will probably never need to do this, but the shell wants to make it possible for you to do it in case you ever need to.

Similarly, the -n option frees you from the default line-oriented way that read consumes input; it allows you to read a fixed number of bytes. This is very useful if you're processing legacy fixed-width data, although this is not very common on Unix systems.

read lets you input lines that are longer than the width of your display device by providing backslash (\) as a continuation character, just as in shell scripts. The -r option to read overrides this, in case your script reads from a file that may contain lines that happen to end in backslashes.

read -r also preserves any other escape sequences the input might contain. For example, if the file fred contains this line:

A line with a\n escape sequence

read -r fredline will include the backslash in the variable fredline, whereas without the -r, read will "eat" the backslash. As a result:

$ read -r fredline < fred
$ print "$fredline"
A line with a
 escape sequence
$

(Here, print interpreted the \n escape sequence and turned it into a newline.) However:

$ read fredline < fred
$ print "$fredline"
A line with an escape sequence
$

The -s option helps you if you are writing a highly interactive script and you want to provide the same command-history capability as the shell itself has. For example, say you are writing a new version of mail as a shell script. Your basic command loop might look like this:

while read -s cmd; do
    # process the command
done

Using read -s allows the user to retrieve previous commands to your program with the emacs-mode CTRL-P command or the vi-mode ESC k command. The kshdb debugger in Chapter 9 uses this feature.

The -t option is quite useful. It allows you to recover in case your user has "gone out to lunch," but your script has better things to do than just wait around for input. You tell it how many seconds you're willing to wait before deciding that the user just doesn't care anymore:

print -n "OK, Mr. $prisoner, enter your name, rank and serial number: "
# wait two hours, no more
if read -t $((60 * 60 * 2)) name rank serial
then
    # process information
    ...
else
    # prisoner is being silent
    print 'The silent treatment, eh? Just you wait.'
    call_evil_colonel -p $prisoner
    ...
fi

If the user enters data before the timeout expires, read returns 0 (success), and the then part of the if is processed. On the other hand, when the user enters nothing, the timeout expires and read returns 1 (failure), executing the else part of the statement.

Although not an option to the read command, the TMOUT variable can affect it. Just as for select, if TMOUT is set to a number representing some number of seconds, the read command times out if nothing is entered within that time, and returns a failure exit status. The -t option overrides the setting of TMOUT.

Finally, the -un option is useful in scripts that read from more than one file at the same time.

Task 7-4 is an example of this that also uses the n< I/O redirector that we saw earlier in this chapter.

Task 7-4

Write a script that prints the contents of two files side by side.

We'll format the output so the two output columns are fixed at 30 characters wide. Here is the code:

typeset -L30 f1 f2
while read -u3 f1 && read -u4 f2; do
    print "$f1$f2"
done 3<$1 4<$2

read -u3 reads from file descriptor 3, and 3<$1 directs the file given as first argument to be input on that file descriptor; the same is true for the second argument and file descriptor 4. Remember that file descriptors 0, 1, and 2 are already used for standard I/O. We use file descriptors 3 and 4 for our two input files; it's best to start from 3 and work upwards to the shell's limit, which is 9.

The typeset command and the quotes around the argument to print ensure that the output columns are 30 characters wide and that trailing whitespace in the lines from the file is preserved. The while loop reads one line from each file until at least one of them runs out of input.

Assume the file dave contains the following:

DAVE
Height: 177.8 cm.
Weight: 79.5 kg.
Hair: brown
Eyes: brown

And the file shirley contains this:

SHIRLEY
Height: 167.6 cm.
Weight: 65.5 kg.
Hair: blonde
Eyes: blue

If the script is called twocols, then twocols dave shirley produces this output:

DAVE                          SHIRLEY
Height: 177.8 cm.             Height: 167.6 cm.
Weight: 79.5 kg.              Weight: 65.5 kg.
Hair: brown                   Hair: blonde
Eyes: brown                   Eyes: blue

7.2. String I/O

7.2.1. print

7.2.1.1. print escape sequences

Table 7-2. print escape sequences

7.2.1.2. Options to print

Table 7-3. print options

7.2.2. printf

Table 7-4. Format specifiers used in printf

Table 7-5. Meaning of precision

Table 7-6. Flags for printf

7.2.2.1. Additional Korn shell printf specifiers