Shell Variables (Learning the Korn Shell, 2nd Edition)

The easiest way to check a variable's value is to use the print built-in command.[38] All print does is print its arguments, but not until the shell has evaluated them. This includes -- among other things that will be discussed later -- taking the values of variables and expanding filename wildcards. So, if the variable fred has the value bob, typing the following causes the shell to simply print bob:

3.4.1. Variables and Quoting

Notice that we used double quotes around variables (and strings containing them) in these print examples. In Chapter 1 we said that some special characters inside double quotes are still interpreted (while none are interpreted inside single quotes).

Perhaps the most important special character that "survives" double quotes is the dollar sign -- meaning that variables are evaluated. It's possible to do without the double quotes in some cases; for example, we could have written the above print command this way:

print The value of \$varname is \"$varname\".

But double quotes are more generally correct.

Here's why. Suppose we did this:

fred='Four spaces between these    words.'

Then if we entered the command print $fred, the result would be:

Four spaces between these words.

What happened to the extra spaces? Without the double quotes, the shell splits the string into words after substituting the variable's value, as it normally does when it processes command lines. The double quotes circumvent this part of the process (by making the shell think that the whole quoted string is a single word).

Therefore the command print "$fred" prints this:

Four spaces between these    words.

This becomes especially important when we start dealing with variables that contain user or file input later on. In particular, it's increasingly common to find directories made available on Unix systems via the network from Apple Macintosh and Microsoft Windows systems, where spaces and other unusual characters are common in filenames.

Double quotes also allow other special characters to work, as we'll see in Chapter 4, Chapter 6, and Chapter 7. But for now, we'll revise the "When in doubt, use single quotes" rule in Chapter 1 by adding, "...unless a string contains a variable, in which case you should use double quotes."

3.4.2. Built-in Variables

As with options, some built-in shell variables are meaningful to general Unix users, while others are arcana for professional programmers. We'll look at the more generally useful ones here, and we'll save some of the more obscure ones for later chapters. Again, Appendix B contains a complete list.

3.4.2.1. Editing mode variables

Several shell variables relate to the command-line editing modes that we saw in the previous chapter. These are listed in Table 3-2.

The first two of these are sometimes used by text editors and other screen-oriented programs, which rely on the variables being set correctly. Although the Korn shell and most windowing systems should know how to set them correctly, you should look at the values of COLUMNS and LINES if you are having display trouble with a screen-oriented program.

Table 3-2. Editing mode variables

Variable	Meaning
`COLUMNS`	Width, in character columns, of your terminal. The standard value is 80 (sometimes 132), though if you are using a windowing system like X, you could give a terminal window any size you wish.
`LINES`	Length of your terminal in text lines. The standard value for terminals is 24, but for IBM PC-compatible monitors it's 25; once again, if you are using a windowing system, you can usually resize to any amount.
`HISTFILE`	Name of history file on which the editing modes operate.
`EDITOR`	Pathname of your favorite text editor; the suffix (`macs`[39] or `vi`) determines which editing mode to use.
`VISUAL`	Similar to `EDITOR`; if set, used in preference to `EDITOR` to choose editing mode.
`HISTEDIT`	Pathname of editor to use with the hist command.

[39] This suffix also works if your editor is a different version of Emacs whose name doesn't end in emacs.

3.4.2.2. Mail variables

Since the mail program is not running all the time, there is no way for it to inform you when you get new mail; therefore the shell does this instead.[40] The shell can't actually check for incoming mail, but it can look at your mail file periodically and determine whether the file has been modified since the last check. The variables listed in Table 3-3 let you control how this works.

[40] The commonly available biff command does a better job of this; while the Korn shell only prints "you have mail" messages right before it prints command prompts, biff can tell you who the mail is from.

Table 3-3. Mail variables

Variable	Meaning
`MAIL`	Name of file to check for incoming mail (i.e., your mail file)
`MAILCHECK`	How often, in seconds, to check for new mail (default 600 seconds, or 10 minutes)
`MAILPATH`	List of filenames, separated by colons (`:`), to check for incoming mail
`_` (underscore)	When used inside `$MAILPATH`, name of mail file that changed; see text for other uses

Under the simplest scenario, you use the standard Unix mail program, and your mail file is /var/mail/yourname or something similar. In this case, you would just set the variable MAIL to this filename if you want your mail checked:

MAIL=/var/mail/yourname

If your system administrator hasn't already done it for you, put a line like this in your .profile.

However, some people use nonstandard mailers that use multiple mail files; MAILPATH was designed to accommodate this. The Korn shell uses the value of MAIL as the name of the file to check, unless MAILPATH is set, in which case the shell checks each file in the MAILPATH list for new mail. You can use this mechanism to have the shell print a different message for each mail file: for each mail filename in MAILPATH, append a question mark followed by the message you want printed.

For example, let's say you have a mail system that automatically sorts your mail into files according to the username of the sender. You have mail files called /var/mail/you/fritchie, /var/mail/you/droberts, /var/mail/you/jphelps, etc. You define your MAILPATH as follows:

MAILPATH=/var/mail/you/fritchie:/var/mail/you/droberts:\
/var/mail/you/jphelps

If you get mail from Jennifer Phelps, the file /var/mail/you/jphelps changes. The Korn shell notices the change within 10 minutes and prints the message:

you have mail in /var/mail/you/jphelps.

If you are in the middle of running a command, the shell waits until the command finishes (or is suspended) to print the message. To customize this further, you could define MAILPATH to be:

MAILPATH=\
/var/mail/you/fritchie?You have mail from Fiona.:\
/var/mail/you/droberts?Mail from Dave has arrived.:\
/var/mail/you/jphelps?There is new mail from Jennifer.

The backslashes at the end of each line allow you to continue your command on the next line. But be careful: you can't indent subsequent lines. Now, if you get mail from Jennifer, the shell prints:

There is new mail from Jennifer.

Within the message parts of MAILPATH, you may use the special variable _ (underscore) for the name of the file that is triggering the message:

MAILPATH='/var/mail/you/fritchie?You have mail from Fiona in $_.'
MAILPATH+=':/var/mail/you/droberts?Mail from Dave has arrived, check $_.'
MAILPATH+=':/var/mail/you/jphelps?There is new mail from Jennifer, look at $_.'

The meaning of $_ actually varies depending on where and how it's used:

Inside the value of MAILPATH

As just described, use $_ for the name of the file that triggers a message in the value of MAILPATH.

The last argument of the last interactive command

When used on a command line entered interactively, $_ represents the last word on the previous command line:

$ print hi          Run a command
hi
$ print $_          Verify setting of $_
hi
$ print hello       New last argument
hello
$ print $_
hello
$ print "hi there"  Usage is word based
hi there
$ print $_
hi there

This usage of $_ is similar to the !$ feature of the C shell's history mechanism.

Inside a script

When accessed from inside a shell script, $_ is the full pathname used to find and invoke the script:

$ cat /tmp/junk       Show test program
print _ is $_
$ PATH=/tmp:$PATH     Add directory to PATH
$ junk                Run the program
_ is /tmp/junk

3.4.2.3. Prompting variables

If you have seen enough experienced Unix users at work, you may already have realized that the shell's prompt is not engraved in stone. It seems as though one of the favorite pastimes of professional Unix programmers is thinking of cute or innovative prompt strings. We'll give you some of the information you need to do your own here; the rest comes in the next chapter.

Actually, the Korn shell uses four prompt strings. They are stored in the variables PS1, PS2, PS3, and PS4. The first of these is called the primary prompt string; it is your usual shell prompt, and its default value is "$ " (a dollar sign followed by a space). Many people like to set their primary prompt string to something containing their login name. Here is one way to do this:

PS1="($LOGNAME)-> "

LOGNAME is another built-in shell variable, which is set to your login name when you log in.[41] So, PS1 becomes a left parenthesis, followed by your login name, followed by ")-> ". If your login name is fred, your prompt string will be "(fred)-> ".If you are a C shell user and, like many such people, are used to having a command number in your prompt string, the Korn shell can do this similarly to the C shell: if there is an exclamation point in the prompt string, it substitutes the command number. Thus, if you define your prompt string to be the following, your prompts will look like (fred 1)->, (fred 2)->, and so on:

[41] Some very old systems use USER instead. Thankfully, such systems are becoming more and more rare with time.

PS1="($LOGNAME !)->"

Perhaps the most useful way to set up your prompt string is so that it always contains your current directory. Then you needn't type pwd to remember where you are. Putting your directory in the prompt is more complicated than the above examples, because your current directory changes during your login session, unlike your login name and the name of your machine. But we can accommodate this by taking advantage of the different kinds of quotes. Here's how:

PS1='($PWD)-> '

The difference is the single quotes, instead of double quotes, surrounding the string on the right side of the assignment. The trick is that this string is evaluated twice: once when the assignment to PS1 is done (in your .profile or environment file) and then again after every command you enter. Here's what each of these evaluations does:

The first evaluation observes the single quotes and returns what is inside them without further processing. As a result, PS1 contains the string ($PWD)-> .

After every command, the shell evaluates ($PWD)->. PWD is a built-in variable that is always equal to the current directory, so the result is a primary prompt that always contains the current directory.[42]

[42] The shell also does command and arithmetic substitution on the value of PS1, but we haven't covered those features yet. See Chapter 6.

We'll discuss the subtleties of quoting and delayed evaluation in more depth in Chapter 7.

PS2 is called the secondary prompt string; its default value is "> " (a greater-than sign followed by a single space). It is used when you type an incomplete line and hit ENTER, as an indication that you must finish your command. For example, assume that you start a quoted string but don't close the quote. Then if you hit ENTER, the shell prints > and waits for you to finish the string:

$ x="This is a long line,            PS1 for the command
> which is terminated down here"     PS2 for the continuation
$                                    PS1 for the next command

PS3 and PS4 relate to shell programming and debugging, respectively; they are explained in Chapter 5 and Chapter 9.

3.4.2.4. Using history command numbers

The current history command number is available in the HISTCMD environment variable. You can see the current history number in your prompt by placing a ! (or $HISTCMD) somewhere in the value of the PS1 variable:

$ PS1="command !> "
command 42> ls -FC *.xml
appa.xml  appd.xml  ch01.xml  ch04.xml  ch07.xml  ch10.xml
appb.xml  appf.xml  ch02.xml  ch05.xml  ch08.xml  colo1.xml
appc.xml  ch00.xml  ch03.xml  ch06.xml  ch09.xml  copy.xml
command 43>

To get a literal ! into the value of your prompt, place !! into PS1.

3.4.2.5. Terminal types

Today, the most common use of the shell is from inside a terminal emulator window displayed on the high resolution screen of a workstation or PC. However, the terminal emulator program still does emulate the facilities provided by the actual serial CRT terminals of yesteryear. As such, the shell variable TERM is vitally important for any program that uses your entire window, like a text editor. Such programs include traditional screen editors (such as vi and Emacs), pager programs likemore, and countless third-party applications.

Because users are spending more and more time within programs and less and less using the shell itself, it is extremely important that your TERM is set correctly. It's really your system administrator's job to help you do this (or to do it for you), but in case you need to do it yourself, here are a few guidelines.

The value of TERM must be a short character string with lowercase letters that appears as a filename in the terminfo database.[43] This database is a two-tiered directory of files under the root directory /usr/share/terminfo.[44] This directory contains subdirectories with single-character names; these in turn contain files of terminal information for all terminals whose names begin with that character. Each file describes how to tell the terminal in question to do certain common things like position the cursor on the screen, go into reverse video, scroll, insert text, and so on. The descriptions are in binary form (i.e., not readable by humans).

[43] Versions of Unix not derived from System V use termcap, an older-style database of terminal capabilities that uses the single text file /etc/termcap for all terminal descriptions. Modern systems often have both the /etc/termcap file and the terminfo database available. Current BSD systems use a single-file indexed database, /usr/share/misc/termcap.db.

[44] This is the typical location on modern systems. Older systems have it in /usr/lib/terminfo.

Names of terminal description files are the same as that of the terminal being described; sometimes an abbreviation is used. For example, the DEC VT100 has a description in the file /usr/share/terminfo/v/vt100; the GNU/Linux character-based console has a description in the file /usr/share/terminfo/l/linux. An xterm terminal window under the X Window System has a description in /usr/share/terminfo/x/xterm.

Sometimes your Unix software will not set up TERM correctly; this often happens for X terminals and PC-based Unix systems. Therefore, you should check the value of TERM by typing print $TERM before going any further. If you find that your Unix system isn't setting the right value for you (especially likely if your terminal is of a different make than your computer), you need to find the appropriate value of TERM yourself.

The best way to find the TERM value -- if you can't find a local guru to do it for you -- is to guess the terminfo name and search for a file of that name under /usr/share/terminfo by using ls. For example, if your terminal is a Blivitz BL-35A, you could try:

$ cd /usr/share/terminfo
$ ls b/bl*

If you are successful, you will see something like this:

bl35a           blivitz35a

In this case, the two names are likely to be synonyms for (links to) the same terminal description, so you could use either one as a value of TERM. In other words, you could put either of these two lines in your .profile:

TERM=bl35a
TERM=blivitz35a

If you aren't successful, ls won't print anything, and you will have to make another guess and try again. If you find that terminfo contains nothing that resembles your terminal, all is not lost. Consult your terminal's manual to see if the terminal can emulate a more popular model; nowadays the odds of this are excellent.

Conversely, terminfo may have several entries that relate to your terminal, for submodels, special modes, etc. If you have a choice of which entry to use as your value of TERM, we suggest you test each one out with your text editor or any other screen-oriented programs you use and see which one works best.

The process is much simpler if you are using a windowing system, in which your "terminals" are logical portions of the screen rather than physical devices. In this case, operating system-dependent software was written to control your terminal window(s), so the odds are very good that if it knows how to handle window resizing and complex cursor motion, it is capable of dealing with simple things like TERM. The X Window System, for example, automatically sets "xterm" as its value for TERM in an xterm terminal window.

3.4.2.6. Command search path

Another important variable is PATH, which helps the shell find the commands you enter.

As you probably know, every command you use is actually a file that contains code for your machine to run.[45] These files are called executable files or just executables for short. They are stored in various directories. Some directories, like /bin or /usr/bin, are standard on all Unix systems; some depend on the particular version of Unix you are using; some are unique to your machine; if you are a programmer, some may even be your own. In any case, there is no reason why you should have to know where a command's executable file is in order to run it.

[45] Unless it's a built-in command (like cd and print), in which case the code is simply part of the executable file for the entire shell.

That is where PATH comes in. Its value is a list of directories that the shell searches every time you enter a command name that does not contain a slash; the directory names are separated by colons (:), just like the files in MAILPATH. For example, if you type print $PATH, you will see something like this:

/sbin:/usr/sbin:/usr/bin:/etc:/usr/X11R6/bin:/local/bin

Why should you care about your path? There are three main reasons. First, there are security aspects to its value, which we touch on shortly. Second, once you have read the later chapters of this book and you try writing your own shell programs, you will want to test them and eventually set aside a directory for them. Third, your system may be set up so that certain "restricted" commands' executable files are kept in directories that are not listed in PATH. For example, there may be a directory /usr/games in which there are executables that are verboten during regular working hours.

Therefore you may want to add directories to the default PATH you get when you login. Let's say you have created a bin directory under your login directory for your own shell scripts and programs. To add this directory to your PATH so that it is there every time you log in, put this line in your .profile:

PATH="$PATH:$HOME/bin"

This sets PATH to whatever it was before, followed immediately by a colon and $HOME/bin (your personal bin directory). This is a rather typical usage. (Using $HOME lets your system administrator move your home directory around, without your having to fix your .profile file.)

There is an important additional detail to understand about how PATH works. This has to do with empty (or "null") elements in the PATH. A null element can occur in one of three ways: placing a lone colon at the front of PATH, placing a lone colon at the end of PATH, or placing two adjacent colons in the middle of PATH. The shell treats a null element in PATH as a synonym for ".", the current directory, and searches in whatever directory you happen to be in at that point in the path search.

PATH=:$HOME/bin:/usr/bin:/usr/local/bin   Search current directory first
PATH=$HOME/bin:/usr/bin:/usr/local/bin:   Search current directory last
PATH=$HOME/bin::/usr/bin:/usr/local/bin   Search current directory second

Finally, if you need to know which directory a command comes from, you need not look at directories in your PATH until you find it. The shell built-in command whence prints the full pathname of the command you give it as argument, or just the command's name if it's a built-in command itself (like cd), an alias, or a function (as we'll see in Chapter 4).

3.4.2.7. PATH security considerations

How you set up your PATH variable can have important implications for security.

First, having the current directory in your path is a real security hole, especially for system administrators, and the root account should never have a null element (or explicit dot) in its search path. Why? Consider someone who creates a shell script named, for example, ls, makes it executable, and places it in a directory that root might cd to, such as /tmp:

rm -f /tmp/ls           Hide the evidence
/bin/ls "$@"            Run real ls
nasty stuff here        Silently run other stuff as root

If root has the current directory first in PATH, then cd /tmp; ls does whatever the miscreant wants, and root is none the wiser. (This is known in the security world as a "trojan horse.") While less serious for regular users, there are many experts who would still advise against having the current directory in PATH.

Secondly, the safest way to add your personal bin to PATH is at the end. When you enter a command, the shell searches directories in the order they appear in PATH until it finds an executable file. Therefore, if you have a shell script or program whose name is the same as an existing command, the shell will use the existing command -- unless you type in the command's full pathname to disambiguate. For example, if you have created your own version of the more command in $HOME/bin and your PATH has $HOME/bin at the end, to get your version you will need to type $HOME/bin/more (or just ~/bin/more).

The more reckless way of resetting your path is to tell the shell to look in your directory first by putting it before the other directories in your PATH:

PATH="$HOME/bin:$PATH"

This is less safe because you are trusting that your own version of the more command works properly. But it is also risky since it might allow for trojan horses (similar to the ls example we just saw). If your bin directory is writable by others on your system, they can install a program that does something nasty.

Proper use of PATH is just one of many aspects of system security. See Chapter 10 for more details. In short, we recommend leaving the current directory out of your PATH (both implicitly and explicitly), adding your personal bin directory at the end of PATH, and making sure that only you can create, remove, or change files in your personal bin directory.

3.4.2.8. PATH and tracked aliases

It is worth noting that a search through the directories in your PATH can take time. You won't exactly die if you hold your breath for the length of time it takes for most computers to search your PATH, but the large number of disk I/O operations involved in some PATH searches can take longer than the command you invoked takes to run!

The Korn shell provides a way to circumvent PATH searches, called a tracked alias. First, notice that if you specify a command by giving its full pathname, the shell won't even use your PATH -- instead, it just goes directly to the executable file.

Tracked aliases do this for you automatically. The first time you invoke a command, the shell looks for the executable in the normal way (through PATH). Then it creates an alias for the full pathname, so that the next time you invoke the command, the shell uses the full pathname and does not bother with PATH at all. If you ever change your PATH, the shell marks tracked aliases as "undefined," so that it searches for the full pathnames again when you invoke the corresponding commands.

In fact, you can add tracked aliases for the sole purpose of avoiding PATH lookup of commands that you use particularly often. Just put a "trivial alias" of the form alias -t command in your .profile or environment file; the shell substitutes the full pathname itself.

For example, the first time you invoke emacs, the shell does a PATH search. Upon finding the location of emacs (say /usr/local/bin/emacs), the shell creates a tracked alias:

alias -t emacs=/usr/local/bin/emacs    Automatic tracked alias

The next time you run emacs, the shell expands the emacs alias into the full path /usr/local/bin/emacs, and executes the program directly, not bothering with a PATH search.

You can also define individual tracked aliases yourself, with the option -t to the alias command, and you can list all such tracked aliases by typing alias -t by itself. (For compatibility with the System V Bourne shell, ksh predefines the alias hash='alias -t --'; the hash command in that shell displays the internal table of found commands. The Korn shell's tracked alias mechanism is more flexible.)

Although the shell's documentation and trackall option indicate that you can turn alias tracking on and off, the shell's actual behavior is different: alias tracking is always on. alias -t lists all of the automatically-created tracked aliases. However, alias -p does not print tracked aliases. This is because, conceptually, tracked aliases are just a performance enhancement; they are really unrelated to the aliases that you define for customization.

3.4.2.9. Directory search path

CDPATH is a variable whose value, like that of PATH, is a list of directories separated by colons. Its purpose is to augment the functionality of the cd built-in command.

By default, CDPATH isn't set (meaning that it is null), and when you type cd dirname, the shell looks in the current directory for a subdirectory called dirname. Similar to PATH, this search is disabled when dirname starts with a slash. If you set CDPATH, you give the shell a list of places to look for dirname; the list may or may not include the current directory.

Here is an example. Consider the alias for the long cd command from earlier in this chapter:

alias cdcm="cd work/projects/devtools/windows/confman"

Now suppose there were a few directories under this directory to which you need to go often; they are called src, bin, and doc. You define your CDPATH like this:

CDPATH=:~/work/projects/devtools/windows/confman

In other words, you define your CDPATH to be the empty string (meaning the current directory, wherever you happen to be) followed by ~/work/projects/devtools/windows/confman.

With this setup, if you type cd doc, then the shell looks in the current directory for a (sub)directory called doc. Assuming that it doesn't find one, it looks in the directory ~/work/projects/devtools/windows/confman. The shell finds the doc directory there, so you go directly to it.

This works for any relative pathname. For example, if you have a directory src/whizprog in your home directory, and your CDPATH is :$HOME (the current directory and your home directory), typing cd src/whizprog takes you to $HOME/src/whizprog from anywhere on the system.

This feature gives you yet another way to save typing when you need to cd often to directories that are buried deep in your file hierarchy. You may find yourself going to a specific group of directories often as you work on a particular project, and then changing to another set of directories when you switch to another project. This implies that the CDPATH feature is only useful if you update it whenever your work habits change; if you don't, you may occasionally find yourself where you don't want to be.

3.4.2.10. Miscellaneous variables

We have covered the shell variables that are important from the standpoint of customization. There are also several that serve as status indicators and for various other miscellaneous purposes. Their meanings are relatively straightforward; the more basic ones are summarized in Table 3-4.

The first two variables are set by the login program, before the shell starts. The shell sets the value of the next two whenever you change directories. The final variable's value changes dynamically, as time elapses. Although you can also set the values of any of these, just like any other variables, it is difficult to imagine any situation where you would want to.

Table 3-4. Status variables

Variable	Meaning
`HOME`	Name of your home (login) directory. This is the default argument for the cd command.
`SHELL`	Pathname of the shell that programs should use to run commands.
`PWD`	Current directory.
`OLDPWD`	Previous directory before the last cd command.
`SECONDS`	Number of seconds since the shell was invoked.

3.4. Shell Variables