for (Learning the Korn Shell, 2nd Edition)

5.2. for

The most obvious enhancement we could make to the previous script is the ability to report on multiple files instead of just one. Tests like -e and -d only take single arguments, so we need a way of calling the code once for each file given on the command line.

The way to do this -- indeed, the way to do many things with the Korn shell -- is with a looping construct. The simplest and most widely applicable of the shell's looping constructs is the for loop. We'll use for to enhance fileinfo soon.

The for loop allows you to repeat a section of code a fixed number of times. During each time through the code (known as an iteration), a special variable called a loop variable is set to a different value; this way each iteration can do something slightly different.

The for loop is somewhat, but not entirely, similar to its counterparts in conventional languages like C and Pascal. The chief difference is that the shell's for loop doesn't let you specify a number of times to iterate or a range of values over which to iterate; instead, it only lets you give a fixed list of values. In other words, with the normal for loop, you can't do anything like this Pascal-type code, which executes statements 10 times:

for x := 1 to 10 do
begin
    statements ...
end

(You need the arithmetic for loop, which we'll see in Chapter 6, to do that.)

However, the for loop is ideal for working with arguments on the command line and with sets of files (e.g., all files in a given directory). We'll look at an example of each of these. But first, here is the syntax for the for construct:

for name [in list]
do
    statements that can use $name ...
done

The list is a list of names. (If in list is omitted, the list defaults to "$@", i.e., the quoted list of command-line arguments, but we always supply the in list for the sake of clarity.) In our solutions to the following task, we show two simple ways to specify lists.

In ksh93 there is an interesting interaction between the for loop and nameref variables (see Chapter 4). If the control variable is a nameref, then each element in the list of names can be a different shell variable, and the shell assigns the nameref to each variable in turn. For example:

$ first="I am first"                                  Initialize test variables
$ second="I am in the middle"
$ third="I am last"
$ nameref refvar=first                                Create nameref
$ for refvar in first second third ; do               Loop over variables
>   print "refvar -> ${!refvar}, value: $refvar"      Print referenced var, value
> done
refvar -> first, value: I am first
refvar -> second, value: I am in the middle
refvar -> third, value: I am last
$ print ${!refvar}, $refvar                           Show final state
third, I am last

The for loop is instrumental for solving Task 5-2.

Task 5-2

You work in an environment with several computers in a local network. Write a shell script that tells you who is logged in to each machine on the network.

The command finger(1) can be used (among other things) to find the names of users logged into a remote system; the command finger @systemname does this. Its output depends on the version of Unix, but it looks something like this:

[motet.early.com]
Trying 127.146.63.17...
-User-    -Full name-       -What- Idle TTY -Console Location-
hildy    Hildegard von Bingen  ksh   2d5h p1  jem.cal (Telnet)
mikes    Michael Schultheiss   csh   1:21 r4  ncd2.cal (X display 0)
orlando  Orlando di Lasso      csh     28 r7  maccala (Telnet)
marin    Marin Marais          mush  1:02 pb  mussell.cal (Telnet)
johnd    John Dowland          tcsh    17 p0  nugget.west.nobis. (X Window)

In this output, motet.early.com is the full network name of the remote machine.

Assume the systems in your network are called fred, bob, dave, and pete. Then the following code would do the trick:

for sys in fred bob dave pete
do
    finger @$sys
    print
done

This works no matter which system you are currently logged into. It prints output for each machine similar to the above, with blank lines in between.

A slightly better solution would be to store the names of the systems in an environment variable. This way, if systems are added to your network and you need a list of their names in more than one script, you need change them in only one place. If a variable's value is several words separated by spaces (or TABS), for will treat it as a list of words.

Here is the improved solution. First, put lines in your .profile or environment file that define the variable SYSNAMES and make it an environment variable:

SYSNAMES="fred bob dave pete"
export SYSNAMES

Then, the script can look like this:

for sys in $SYSNAMES
do
    finger @$sys
    print
done

The foregoing illustrates a simple use of for, but it's much more common to use for to iterate through a list of command-line arguments. To show this, we can enhance the fileinfo script above to accept multiple arguments. First, we write a bit of "wrapper" code that does the iteration:

for filename in "$@" ; do
    finfo $filename
    print
done

Next, we make the original script into a function called finfo:[73]

function finfo {
    if [[ ! -e $1 ]]; then
        print "file $1 does not exist."
        return 1
    fi
    ...
}

[73] A function can have the same name as a script; however, this isn't good programming practice.

The complete script consists of the for loop code and the above function. Because the function must be defined before it can be used, the function definition must go first, or else it should be in a directory listed in both PATH and FPATH.

The fileinfo script works as follows: in the for statement, "$@" is a list of all positional parameters. For each argument, the body of the loop is run with filename set to that argument. In other words, the function fileinfo is called once for each value of $filename as its first argument ($1). The call to print after the call to fileinfo merely prints a blank line between sets of information about each file.

Given a directory with the same files as the previous example, typing fileinfo * would produce the following output:

bob is a regular file.
you own the file.
you have read permission on the file.
you have write permission on the file.
you have execute permission on the file.

custom.tbl is a regular file.
you own the file.
you have read permission on the file.
you have write permission on the file.

exp is a directory that you may search.
you own the file.
you have read permission on the file.
you have write permission on the file.

lpst is a regular file.
you do not own the file.
you have read permission on the file.

Task 5-3 is a programming task that exploits the other major use of for.

Task 5-3

Your Unix system has the ability to transfer files from an MS-DOS system, but it leaves the MS-DOS filenames intact. Write a script that translates the filenames in a given directory from MS-DOS format to a more Unix-friendly format.

Filenames in the old Microsoft MS-DOS system have the format FILENAME.EXT. FILENAME can be up to eight characters long; EXT is an extension that can be up to three characters. Letters are all uppercase. We want to do the following:

Translate letters from uppercase to lowercase.
If the extension is null, remove the dot.

The first tool we will need for this job is the Unix tr(1) utility, which translates characters on a one-to-one basis.[74] Given the arguments charset1 and charset2, it translates characters in the standard input that are members of charset1 into corresponding characters in charset2. The two sets are ranges of characters enclosed in square brackets ([...] in standard regular-expression form in the manner of grep, awk, ed, etc.). More to the point, tr [A-Z] [a-z] takes its standard input, converts uppercase letters to lowercase, and writes the converted text to the standard output.[75]

[74] As we will see in Chapter 6, it is possible to do the case translation within the shell, without using an external program. We'll ignore that fact for now, though.

[75] Modern POSIX-compliant systems support locales, which are ways of using non-ASCII character sets in a portable fashion. On such a system, the correct invocation of tr is tr '[:upper:]' '[:lower:]'. Most long-time Unix users tend to forget this, though.

That takes care of the first step in the translation process. We can use a Korn shell string operator to handle the second. Here is the code for a script we'll call dosmv:

for filename in ${1:+$1/}* ; do
    newfilename=$(print $filename | tr '[A-Z]' '[a-z]')
    newfilename=${newfilename%.}
    print "$filename -> $newfilename"
    mv $filename $newfilename
done

The * in the for construct is not the same as $*. It's a wildcard, i.e., all files in a directory.

This script accepts a directory name as argument, the default being the current directory. The expression ${1:+$1/} evaluates to the argument ($1) with a slash appended if the argument is supplied, or the null string if it isn't supplied. So the entire expression ${1:+$1/}* evaluates to all files in the given directory, or all files in the current directory if no argument is given.

Therefore, filename takes on the value of each filename in the list. filename gets translated into newfilename in two steps. (We could have done it in one, but readability would have suffered.) The first step uses tr in a pipeline within a command substitution construct. Our old friend print makes the value of filename the standard input to tr. tr's output becomes the value of the command substitution expression, which is assigned to newfilename. Thus, if $filename were DOSFILE.TXT, newfilename would become dosfile.txt.

The second step uses one of the shell's pattern-matching operators, the one that deletes the shortest match it finds at the end of the string. The pattern here is ., which means a dot at the end of the string.[76] This means that the expression ${newfilename%.} will delete a dot from $newfilename only if it's at the end of the string; otherwise the expression will leave $newfilename intact. For example, if $newfilename is dosfile.txt, it will be untouched, but if it's dosfile., the expression will change it to dosfile without the final dot. In either case, the new value is assigned back to newfilename.

[76] Unix regular expression mavens should remember that this is shell wildcard syntax, in which dots are not operators and therefore do not need to be backslash-escaped.

The last statement in the for loop body does the file renaming with the standard Unix mv(1) command. Before that, a print command simply informs the user of what's happening.

There is one little problem with this solution: if there are any files in the given directory that aren't MS-DOS files (in particular, if there are files whose names don't contain uppercase letters or don't contain a dot), then the conversion will do nothing to those filenames and mv will be called with two identical arguments. mv will complain with the message: mv: filename and filename are identical. The solution is very simple: test to see if the filenames are identical:

for filename in ${1:+$1/}* ; do
    newfilename=$(print $filename | tr '[A-Z]' '[a-z]')
    newfilename=${newfilename%.}
    # subtlety: quote value of $newfilename to do string comparison,
    # not regular expression match
    if [[ $filename != "$newfilename" ]]; then
        print "$filename -> $newfilename"
        mv $filename $newfilename
    fi
done

If you are familiar with an operating system other than MS-DOS and Unix, you may want to test your script-writing prowess at this point by writing a script that translates filenames from that system's format into Unix format. Use the above script as a guideline.

In particular, if you know the OpenVMS operating system from Compaq (nee DEC), here's a programming challenge:

Write a script called vmsmv that is similar to dosmv but works on OpenVMS filenames instead of MS-DOS filenames. Remember that OpenVMS filenames end with semicolons and version numbers.
Modify your script so that if there are several versions of the same file, it renames only the latest version (with the highest version number).
Modify it further so that your script erases old versions of files.

The first of these is a relatively straightforward modification of dosmv. Number 2 is difficult; here's a strategy hint:

Develop a regular expression that matches OpenVMS filenames (you need this for Number 1 anyway).

Get a list of base names (sans version numbers) of files in the given directory by piping ls through grep (with the above regular expression), cut, and sort -u. Use cut with a semicolon as "field separator"; make sure that you quote the semicolon so that the shell doesn't treat it as a statement separator. sort -u removes duplicates after sorting. Use command substitution to save the resulting list in a variable.

Use a for loop on the list of base names. For each name, get the highest version number of the file (just the number, not the whole name). Do this with another pipeline: pipe ls through cut, sort -n, and tail -1. sort -n sorts in numerical (not lexicographical) order; tail -N outputs the last N lines of its input. Again, use command substitution to capture the output of this pipeline in a variable.

Append the highest version number to the base name; this is the file to rename in Unix format.

Once you have completed Number 2, you can do Number 3 by adding a single line of code to your script; see if you can figure out how.

Finally, ksh93 provides the arithmetic for loop, which is much closer in syntax and style to the C for loop. We present it in the next chapter, after discussing the shell's general arithmetic capabilities.