String Operators (Learning the Korn Shell, 2nd Edition)

4.5.1. Syntax of String Operators

The basic idea behind the syntax of string operators is that special characters that denote operations are inserted between the variable's name and the right curly brace. Any argument that the operator may need is inserted to the operator's right.

The first group of string-handling operators tests for the existence of variables and allows substitutions of default values under certain conditions. These are listed in Table 4-2.

Table 4-2. Substitution operators

Operator	Substitution
`${``varname``:-word}`	If varname exists and isn't null, return its value; otherwise return word.
Purpose:	Returning a default value if the variable is undefined.
Example:	`${count:-0}` evaluates to 0 if `count` is undefined.

`${``varname``:=word}`	If varname exists and isn't null, return its value; otherwise set it to word and then return its value.[55]
Purpose:	Setting a variable to a default value if it is undefined.
Example:	`${count:=0}` sets `count` to 0 if it is undefined.

`${``varname``:?message}`	If varname exists and isn't null, return its value; otherwise print `varname``:` `message`, and abort the current command or script. Omitting message produces the default message `parameter null or not set`. Note, however, that interactive shells do not abort.
Purpose:	Catching errors that result from variables being undefined.
Example:	`${count:?"undefined!"}` prints `count: undefined!` and exits if `count` is undefined.

`${``varname``:+word}`	If varname exists and isn't null, return word; otherwise return null.
Purpose:	Testing for the existence of a variable.
Example:	`${count:+1}` returns 1 (which could mean "true") if `count` is defined.

[55] Pascal, Modula, and Ada programmers may find it helpful to recognize the similarity of this to the assignment operators in those languages.

The colon (:) in each of these operators is actually optional. If the colon is omitted, then change "exists and isn't null" to "exists" in each definition, i.e., the operator tests for existence only.

The first two of these operators are ideal for setting defaults for command-line arguments in case the user omits them. We'll actually use all four in Task 4-1, which is our first programming task.

Task 4-1

You have a large album collection, and you want to write some software to keep track of it. Assume that you have a file of data on how many albums you have by each artist. Lines in the file look like this:

14 Bach, J.S. 1 Balachander, S. 21 Beatles 6 Blakey, Art

Write a program that prints the N highest lines, i.e., the N artists by whom you have the most albums. The default for N should be 10. The program should take one argument for the name of the input file and an optional second argument for how many lines to print.

By far the best approach to this type of script is to use built-in Unix utilities, combining them with I/O redirectors and pipes. This is the classic "building-block" philosophy of Unix that is another reason for its great popularity with programmers. The building-block technique lets us write a first version of the script that is only one line long:

sort -nr "$1" | head -${2:-10}

Here is how this works: the sort(1) program sorts the data in the file whose name is given as the first argument ($1). (The double quotes allow for spaces or other unusual characters in file names, and also prevent wildcard expansion.) The -n option tells sort to interpret the first word on each line as a number (instead of as a character string); the -r tells it to reverse the comparisons, so as to sort in descending order.

The output of sort is piped into the head(1) utility, which, when given the argument -N, prints the first N lines of its input on the standard output. The expression -${2:-10} evaluates to a dash (-) followed by the second argument, if it is given, or to 10 if it's not; notice that the variable in this expression is 2, which is the second positional parameter.

Assume the script we want to write is called highest. Then if the user types highest myfile, the line that actually runs is:

sort -nr myfile | head -10

Or if the user types highest myfile 22, the line that runs is:

sort -nr myfile | head -22

Make sure you understand how the :- string operator provides a default value.

This is a perfectly good, runnable script -- but it has a few problems. First, its one line is a bit cryptic. While this isn't much of a problem for such a tiny script, it's not wise to write long, elaborate scripts in this manner. A few minor changes makes the code more readable.

First, we can add comments to the code; anything between # and the end of a line is a comment. At minimum, the script should start with a few comment lines that indicate what the script does and the arguments it accepts. Next, we can improve the variable names by assigning the values of the positional parameters to regular variables with mnemonic names. Last, we can add blank lines to space things out; blank lines, like comments, are ignored. Here is a more readable version:

#	highest filename [howmany]
#
#	Print howmany highest-numbered lines in file filename.
#	The input file is assumed to have lines that start with
#	numbers.  Default for howmany is 10.

filename=$1

howmany=${2:-10}
sort -nr "$filename" | head -$howmany

The square brackets around howmany in the comments adhere to the convention in Unix documentation that square brackets denote optional arguments.

The changes we just made improve the code's readability but not how it runs. What if the user invoked the script without any arguments? Remember that positional parameters default to null if they aren't defined. If there are no arguments, then $1 and $2 are both null. The variable howmany ($2) is set up to default to 10, but there is no default for filename ($1). The result would be that this command runs:

sort -nr | head -10

As it happens, if sort is called without a filename argument, it expects input to come from standard input, e.g., a pipe (|) or a user's keyboard. Since it doesn't have the pipe, it will expect the keyboard. This means that the script will appear to hang! Although you could always type CTRL-D or CTRL-C to get out of the script, a naive user might not know this.

Therefore we need to make sure that the user supplies at least one argument. There are a few ways of doing this; one of them involves another string operator. We'll replace the line:

filename=$1

with:

filename=${1:?"filename missing."}

This causes two things to happen if a user invokes the script without any arguments: first, the shell prints the somewhat unfortunate message to the standard error output:

highest: line 1: : filename missing.

Second, the script exits without running the remaining code.

With a somewhat "kludgy" modification, we can get a slightly better error message. Consider this code:

filename=$1
filename=${filename:?"missing."}

This results in the message:

highest: line 2: filename: filename missing.

(Make sure you understand why.) Of course, there are ways of printing whatever message is desired; we'll find out how in Chapter 5.

Before we move on, we'll look more closely at the two remaining operators in Table 4-2 and see how we can incorporate them into our task solution. The := operator does roughly the same thing as :-, except that it has the side effect of setting the value of the variable to the given word if the variable doesn't exist.

Therefore we would like to use := in our script in place of :-, but we can't; we'd be trying to set the value of a positional parameter, which is not allowed. But if we replaced:

howmany=${2:-10}

with just:

howmany=$2

and moved the substitution down to the actual command line (as we did at the start), then we could use the := operator:

sort -nr "$filename" | head -${howmany:=10}

Using := has the added benefit of setting the value of howmany to 10 in case we need it afterwards in later versions of the script.

The final substitution operator is :+. Here is how we can use it in our example: let's say we want to give the user the option of adding a header line to the script's output. If he types the option -h, the output will be preceded by the line:

ALBUMS  ARTIST

Assume further that this option ends up in the variable header, i.e., $header is -h if the option is set or null if not. (Later we see how to do this without disturbing the other positional parameters.)

The expression:

${header:+"ALBUMS  ARTIST\n"}

yields null if the variable header is null or ALBUMS ARTIST\n if it is non-null. This means that we can put the line:

print -n ${header:+"ALBUMS  ARTIST\n"}

right before the command line that does the actual work. The -n option to print causes it not to print a newline after printing its arguments. Therefore this print statement prints nothing -- not even a blank line -- if header is null; otherwise it prints the header line and a newline (\n).

Operator

Meaning

*(exp)

0 or more occurrences of exp

+(exp)

1 or more occurrences of exp

?(exp)

0 or 1 occurrences of exp

@(exp1|exp2|...)

Exactly one of exp1 or exp2 or ...

!(exp)

Anything that doesn't match exp[57]

Expression

Matches

x

*(x)

Null string, x, xx, xxx, ...

+(x)

x, xx, xxx, ...

?(x)

Null string, x

!(x)

Any string except x

@(x)

x (see below)

Class

Matching characters

[:alnum:]

Alphanumeric characters

[:alpha:]

Alphabetic characters

[:blank:]

Space and tab characters

[:cntrl:]

Control characters

[:digit:]

Numeric characters

[:graph:]

Printable and visible (non-space) characters

[:lower:]

Lowercase characters

[:print:]

Printable characters (includes whitespace)

[:punct:]

Punctuation characters

[:space:]

Whitespace characters

[:upper:]

Uppercase characters

[:xdigit:]

Hexadecimal digits

Korn shell

egrep/awk

Meaning

*(exp)

exp*

0 or more occurrences of exp

+(exp)

exp+

1 or more occurrences of exp

?(exp)

exp?

0 or 1 occurrences of exp

@(exp1|exp2|...)

exp1|exp2|...

exp1 or exp2 or ...

!(exp)

(none)

Anything that doesn't match exp

\N

\N (grep)

Match same text as matched by previous parenthesized subexpression number N

Operator

Meaning

{N}(exp)

Exactly N occurrences of exp

{N,M}(exp)

Between N and M occurrences of exp

*-(exp)

0 or more occurrences of exp, shortest match

+-(exp)

1 or more occurrences of exp, shortest match

?-(exp)

0 or 1 occurrences of exp, shortest match

@-(exp1|exp2|...)

Exactly one of exp1 or exp2 or ..., shortest match

{N}-(exp)

Exactly N occurrences of exp, shortest match

{N,M}-(exp)

Between N and M occurrences of exp, shortest match

Escape sequence

Meaning

\d

Same as [[:digit:]]

\D

Same as [![:digit:]]

\s

Same as [[:space:]]

\S

Same as [![:space:]]

\w

Same as [[:word:]]

\W

Same as [![:word:]]

Operator

Meaning

${variable#pattern}

If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest.

${variable##pattern}

If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest.

${{variable%pattern}

If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest.

${variable%%pattern}

If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest.

Expression

Result

${path##/*/}

                long.file.name

${path#/*/}

      billr/mem/long.file.name

$path

/home/billr/mem/long.file.name

${path%.*}

/home/billr/mem/long.file

${path%%.*}

/home/billr/mem/loang

Operator

Meaning

${variable:start}

These represent substring operations. The result is the value of variable starting at position start and going for length characters. The first character is at position 0, and if no length is provided, the rest of the string is used.

When used with $* or $@ or an array indexed by * or @ (see Chapter 6), start is a starting index and length is the count of elements. In other words, the result is a slice out of the positional parameters or array. Both start and length may be arithmetic expressions.

Beginning with ksh93m, a negative start is taken as relative to the end of the string. For example, if a string has 10 characters, numbered 0 to 9, a start value of -2 means 7 (9 - 2 = 7). Similarly, if variable is an indexed array, a negative start yields an index by working backwards from the highest subscript in the array.

${variable:start:length}

${variable/pattern/replace}

If variable contains a match for pattern, the first match is replaced with the text of replace.

${variable//pattern/replace}

This is the same as the previous operation, except that every match of the pattern is replaced.

${variable/pattern}

If variable contains a match for pattern, delete the first match of pattern.

${variable/#pattern/replace}

If variable contains a match for pattern, the first match is replaced with the text of replace. The match is constrained to occur at the beginning of variable's value. If it doesn't match there, no substitution occurs.

${variable/%pattern/replace}

If variable contains a match for pattern, the first match is replaced with the text of replace. The match is constrained to occur at the end of variable's value. If it doesn't match there, no substitution occurs.

$ x='12345abc6789' $ print ${x//+([[:digit:]])/X} Substitution with longest match XabcX $ print ${x//+-([[:digit:]])/X} Substitution with shortest match XXXXXabcXXXX $ print ${x##+([[:digit:]])} Remove longest match abc6789 $ print ${x#+([[:digit:]])} Remove shortest match 2345abc6789

Operator

Meaning

${!variable}

Return the name of the real variable referenced by the nameref variable.

${!base*}

List of all variables whose names begin with base.

${!base@}

4.5. String Operators

4.5.1. Syntax of String Operators

Table 4-2. Substitution operators

Task 4-1

4.5.2. Patterns and Regular Expressions

4.5.2.1. Regular expression basics

Table 4-3. Regular expression operators

Table 4-4. Regular expression operator examples

4.5.2.2. POSIX character class additions

Table 4-5. POSIX character classes

4.5.2.3. Korn shell versus awk/egrep regular expressions

Table 4-6. Shell versus egrep/awk regular expression operators

4.5.2.4. Pattern matching with regular expressions

Table 4-7. New pattern matching operators in ksh93l and later

Table 4-8. Regular expression escape sequences

4.5.3. Pattern-Matching Operators

Table 4-9. Pattern-matching operators

Task 4-2

Task 4-3

4.5.4. Pattern Substitution Operators

Table 4-10. Pattern substitution operators

4.5.4.1. Greedy versus non-greedy matching

4.5.5. Variable Name Operators

Table 4-11. Name-related operators

4.5.6. Length Operators

4.5.7. The .sh.match Variable