[Chapter 5] 5.2 for

5.2 for

The most obvious enhancement we could make to the previous script is the ability to report on multiple files instead of just one. Tests like -a and -d only take single arguments, so we need a way of calling the code once for each file given on the command line.

The way to do this-indeed, the way to do many things with the Korn shell-is with a looping construct. The simplest and most widely applicable of the shell's looping constructs is the for loop. We'll use for to enhance fileinfo soon.

The for loop allows you to repeat a section of code a fixed number of times. During each time through the code (known as an iteration ), a special variable called a loop variable is set to a different value; this way each iteration can do something slightly different.

The for loop is somewhat, but not entirely, similar to its counterparts in conventional languages like C and Pascal. The chief difference is that the shell's for loop doesn't let you specify a number of times to iterate or a range of values over which to iterate; instead, it only lets you give a fixed list of values. In other words, you can't do anything like this Pascal-type code, which executes statements 10 times:

for x := 1 to 10 do
begin
    statements...

end

(You need the while construct, which we'll see soon, to construct this type of loop. You also need the ability to do integer arithmetic, which we will see in Chapter 6, Command-line Options and Typed Variables .)

However, the for loop is ideal for working with arguments on the command line and with sets of files (e.g., all files in a given directory). We'll look at an example of each of these. But first, we'll show the syntax for the for construct:

for name

 [in list

]
do
    statements that can use

 $name...
done

The list is a list of names. (If in list is omitted, the list defaults to " $@" , i.e., the quoted list of command-line arguments, but we'll always supply the in list for the sake of clarity.) In our solutions to the following task, we'll show two simple ways to specify lists.

Task 5.2

You work in an environment with several computers in a local network. Write a shell script that tells you who is logged in to each machine on the network.

The command finger (1) can be used (among other things) to find the names of users logged into a remote system; the command finger @ systemname does this. Its output depends on the version of UNIX, but it looks something like this:

[motet.early.com]
Trying 127.146.63.17...
-User-    -Full name-       -What- Idle TTY -Console Location-
hildy    Hildegard von Bingen  ksh   2d5h p1  jem.cal (Telnet)
mikes    Michael Schultheiss   csh   1:21 r4  ncd2.cal (X display 0)
orlando  Orlando di Lasso      csh     28 r7  maccala (Telnet)
marin    Marin Marais          mush  1:02 pb  mussell.cal (Telnet)
johnd    John Dowland          tcsh    17 p0  nugget.west.nobis. (X Window)

In this output, motet.early.com is the full network name of the remote machine.

Assume the systems in your network are called fred , bob , dave , and pete . Then the following code would do the trick:

for sys in fred bob dave pete
do
    finger @$sys
    print
done

This works no matter which of the systems you are currently logged into. It prints output for each machine similar to the above, with blank lines in between.

A slightly better solution would be to store the names of the systems in an environment variable. This way, if systems are added to your network and you need a list of their names in more than one script, you need change them in only one place. If a variable's value is several words separated by blanks (or TABS), for will treat it as a list of words.

Here is the improved solution. First, put lines in your .profile or environment file that define the variable SYSNAMES and make it an environment variable:

SYSNAMES="fred bob dave pete"
export SYSNAMES

Then, the script can look like this:

for sys in $SYSNAMES
do
    finger @$sys
    print
done

The foregoing illustrated a simple use of for , but it's much more common to use for to iterate through a list of command-line arguments. To show this, we can enhance the fileinfo script above to accept multiple arguments. First, we write a bit of "wrapper" code that does the iteration:

for filename in "$@" ; do
    finfo $filename
    print
done

Next, we make the original script into a function called finfo : [11]

function finfo {
    if [[ ! -a $1 ]]; then
        print "file $1 does not exist."
        return 1
    fi
    ...
}

[11] A function can have the same name as a script; however, this isn't good programming practice.

The complete script consists of the for loop code and the above function, in either order; good programming style dictates that the function definition should go first.

The fileinfo script works as follows: in the for statement, " $@ " is a list of all positional parameters. For each argument, the body of the loop is run with filename set to that argument. In other words, the function fileinfo is called once for each value of $filename as its first argument ($1 ). The call to print after the call to fileinfo merely prints a blank line between sets of information about each file.

Given a directory with the same files as the previous example, typing fileinfo * would produce the following output:

bob is a regular file.
you own the file.
you have read permission on the file.
you have write permission on the file.
you have execute permission on the file.

custom.tbl is a regular file.
you own the file.
you have read permission on the file.
you have write permission on the file.

exp is a directory that you may search.
you own the file.
you have read permission on the file.
you have write permission on the file.

lpst is a regular file.
you do not own the file.
you have read permission on the file.

Here is a programming task that exploits the other major use of for .

Task 5.3

Your UNIX system has the ability to transfer files from an MS-DOS system, but it leaves the DOS filenames intact. Write a script that translates the filenames in a given directory from DOS format to a more UNIX-friendly format.

DOS filenames have the format FILENAME.EXT . FILENAME can be up to eight characters long; EXT is an extension that can be up to three characters. The dot is required even if the extension is null; letters are all uppercase. We want to do the following:

Translate letters from uppercase to lowercase.
If the extension is null, remove the dot.

The first tool we will need for this job is the UNIX tr (1) utility, which translates characters on a one-to-one basis. Given the arguments charset1 and charset2 , it will translate characters in the standard input that are members of charset1 into corresponding characters in charset2 . The two sets are ranges of characters enclosed in square brackets ([] in standard regular-expression form in the manner of grep , awk , ed , etc.). More to the point, tr [A-Z] [a-z] takes its standard input, converts uppercase letters to lowercase, and writes the converted text to the standard output.

That takes care of the first step in the translation process. We can use a Korn shell string operator to handle the second. Here is the code for a script we'll call dosmv :

for filename in ${1:+$1/}* ; do
    newfilename=$(print $filename | tr [A-Z] [a-z])
    newfilename=${newfilename%.}
    print "$filename -> $newfilename"
    mv $filename $newfilename
done

The * in the for construct is not the same as $ *. It's a wildcard, i.e., all files in a directory.

This script accepts a directory name as argument, the default being the current directory. The expression ${1:+$1/} evaluates to the argument ($1 ) with a slash appended if the argument is supplied, or the null string if it isn't supplied. So the entire expression ${1:+$1/}* evaluates to all files in the given directory, or all files in the current directory if no argument is given.

Therefore, filename takes on the value of each filename in the list. filename gets translated into newfilename in two steps. (We could have done it in one, but readability would have suffered.) The first step uses tr in a pipeline within a command substitution construct. Our old friend print makes the value of filename the standard input to tr . tr 's output becomes the value of the command substitution expression, which is assigned to newfilename . Thus, if $filename were DOSFILE.TXT , newfilename would become dosfile.txt .

The second step uses one of the shell's pattern-matching operators, the one that deletes the shortest match it finds at the end of the string. The pattern here is ., which means a dot at the end of the string. [12] This means that the expression ${newfilename%.} will delete a dot from $newfilename only if it's at the end of the string; otherwise the expression will leave $newfilename intact. For example, if $newfilename is dosfile.txt , it will be untouched, but if it's dosfile. , the expression will change it to dosfile without the final dot. In either case, the new value is assigned back to newfilename .

[12] UNIX regular expression mavens should remember that this is shell wildcard syntax, in which dots are not operators and therefore do not need to be backslash-escaped.

The last statement in the for loop body does the file renaming with the standard UNIX mv (1) command. Before that, a print command simply informs the user of what's happening.

There is one little problem with the solution on the previous page: if there are any files in the given directory that aren't DOS files (in particular, if there are files whose names don't contain uppercase letters and don't contain a dot), then the conversion will do nothing to those filenames and mv will be called with two identical arguments. mv will complain with the message: mv: filename and filename are identical . We can solve this problem by letting grep determine whether each file has a DOS filename or not. The grep regular expression:

[^a-z]\{1,8\}\.[^a-z]\{0,3\}

is adequate (for these purposes) for matching DOS-format filenames. [13] The character class [^a-z] means "any character except a lowercase letter." [14] So the entire regular expression means: "Between 1 and 8 non-lowercase letters, followed by a dot, followed by 0 to 3 non-lowercase letters."

[13] As with the lsd function in Chapter 4, Basic Shell Programming older BSD-derived versions of UNIX don't support the "repeat count" operator within grep . You must use this instead:
[^a-z][^a-z]?[^a-z]?[^a-z]?[^a-z]?[^a-z]?[^a-z]?[^a-z]?\.[^a-z]?[^a-z]?[^a-z]?
[14] To be completely precise, this class also excludes NEWLINEs.

When grep runs, it normally prints all of the lines in its standard input that match the pattern you give it as argument. But we only need it to test whether or not the pattern is matched. Luckily, grep 's exit status is "well-behaved": it's 0 if there is a match in the input, 1 if not. Therefore, we can use the exit status to test for a match. We also need to discard grep 's output; to do this, we redirect it to the special file /dev/null , which is colloquially known as the "bit bucket." [15] Any output directed to /dev/null effectively disappears. Thus, the command line:

print "$filename" | grep '[^a-z]\{1,8\}\.[^a-z]\{0,3\}' > /dev/null

[15] Some Berkeley-derived versions of UNIX have a -s ("silent") option to grep that suppresses standard output, thereby making redirection to /dev/null unnecessary.

prints nothing and returns exit status 0 if the filename is in DOS format, 1 if not.

Now we can modify our dosmv script to incorporate this code:

dos_regexp='[^a-z]\{1,8\}\.[^a-z]\{0,3\}'
for filename in ${1:+$1/}* ; do
if print $filename | grep $dos_regexp > /dev/null; then
newfilename=$(print $filename | tr [A-Z] [a-z])
newfilename=${newfilename%.}
print "$filename -> $newfilename"
mv $filename $newfilename
fi
done

For readability reasons, we use the variable dos_regexp to hold the DOS filename-matching regular expression.

If you are familiar with an operating system other than DOS and UNIX, you may want to test your script-writing prowess at this point by writing a script that translates filenames from that system's format into UNIX format. Use the above script as a guideline.

In particular, if you know DEC's VAX/VMS operating system, here's a programming challenge:

Write a script called vmsmv that is similar to dosmv but works on VAX/VMS filenames instead of DOS filenames. Remember that VAX/VMS filenames end with semicolons and version numbers.
Modify your script so that if there are several versions of the same file, it renames only the latest version (with the highest version number).
Modify further so that your script erases old versions of files.

The first of these is a relatively straightforward modification of dosmv . Number 2 is difficult; here's a strategy hint:

Develop a regular expression that matches VAX/VMS filenames (you need this for No. 1 anyway).
Get a list of base names (sans version numbers) of files in the given directory by piping ls through grep (with the above regular expression), cut , and sort -u . Use cut with a semicolon as "field separator"; make sure that you quote the semicolon so that the shell doesn't treat it as a statement separator. sort -u removes duplicates after sorting. Use command substitution to save the resulting list in a variable.
Use a for loop on the list of base names. For each name, get the highest version number of the file (just the number, not the whole name). Do this with another pipeline: pipe ls through cut , sort -n , and tail -1 . sort -n sorts in numerical (not lexicographical) order; tail -N outputs the last N lines of its input. Again, use command substitution to capture the output of this pipeline in a variable.
Append the highest version number to the base name; this is the file to rename in UNIX format.

Once you have completed No. 2, you can do No. 3 by adding a single line of code to your script; see if you can figure out how.