[Chapter 6] 6.2 Integer Variables and Arithmetic

6.2 Integer Variables and Arithmetic

The expression $(($OPTIND - 1)) in the last example gives a clue as to how the shell can do integer arithmetic. As you might guess, the shell interprets words surrounded by $(( and )) as arithmetic expressions. Variables in arithmetic expressions do not need to be preceded by dollar signs, though it is not wrong to do so.

Arithmetic expressions are evaluated inside double quotes, like tildes, variables, and command substitutions. We're finally in a position to state the definitive rule about quoting strings: When in doubt, enclose a string in single quotes, unless it contains tildes or any expression involving a dollar sign, in which case you should use double quotes.

For example, the date (1) command on System V-derived versions of UNIX accepts arguments that tell it how to format its output. The argument +%j tells it to print the day of the year, i.e., the number of days since December 31st of the previous year.

We can use +%j to print a little holiday anticipation message:

print "Only $(( (365-$(date +%j)) / 7 )) weeks until the New Year!"

We'll show where this fits in the overall scheme of command-line processing in Chapter 7, Input/Output and Command-line Processing .

The arithmetic expression feature is built in to the Korn shell's syntax, and was available in the Bourne shell (most versions) only through the external command expr (1). Thus it is yet another example of a desirable feature provided by an external command (i.e., a syntactic kludge) being better integrated into the shell. [[ / ]] and getopts are also examples of this design trend.

Korn shell arithmetic expressions are equivalent to their counterparts in the C language. [5] Precedence and associativity are the same as in C. Table 6.2 shows the arithmetic operators that are supported. Although some of these are (or contain) special characters, there is no need to backslash-escape them, because they are within the $(( ... )) syntax.

[5] The assignment forms of these operators are also permitted. For example, $((x += 2)) adds 2 to x and stores the result back in x .

Table 6.2: Arithmetic Operators
Operator	Meaning
+	Plus
-	Minus
`*`	Times
/	Division (with truncation)
%	Remainder
<<	Bit-shift left
>>	Bit-shift right
&	Bitwise and
\|	Bitwise or
~	Bitwise not
^	Bitwise exclusive or

Parentheses can be used to group subexpressions. The arithmetic expression syntax also (like C) supports relational operators as "truth values" of 1 for true and 0 for false. Table 6.3 shows the relational operators and the logical operators that can be used to combine relational expressions.

Table 6.3: Relational Operators
Operator	Meaning
<	Less than
>	Greater than
<=	Less than or equal
>=	Greater than or equal
==	Equal
!=	Not equal
&&	Logical and
\|\|	Logical or

For example, $((3 > 2)) has the value 1; $(( (3 > 2) || (4 <= 1) )) also has the value 1, since at least one of the two subexpressions is true.

The shell also supports base N numbers, where N can be up to 36. The notation B # N means " N base B ". Of course, if you omit the B # , the base defaults to 10.

6.2.1 Arithmetic Conditionals

Another construct, closely related to $((...)) , is ((...)) (without the leading dollar sign). We use this for evaluating arithmetic condition tests, just as [[...]] is used for string, file attribute, and other types of tests.

((...)) evaluates relational operators differently from $((...)) so that you can use it in if and while constructs. Instead of producing a textual result, it just sets its exit status according to the truth of the expression: 0 if true, 1 otherwise. So, for example, ((3 > 2)) produces exit status 0, as does (( (3 > 2) || (4 <= 1) )) , but (( (3 > 2) && (4 <= 1) )) has exit status 1 since the second subexpression isn't true.

You can also use numerical values for truth values within this construct. It's like the analogous concept in C, which means that it's somewhat counterintuitive to non-C programmers: a value of 0 means false (i.e., returns exit status 1), and a non-0 value means true (returns exit status 0), e.g., (( 14 )) is true. See the code for the kshdb debugger in Chapter 9 for two more examples of this.

6.2.2 Arithmetic Variables and Assignment

The (( ... )) construct can also be used to define integer variables and assign values to them. The statement:

(( 

intvar

=

expression

 ))

creates the integer variable intvar (if it doesn't already exist) and assigns to it the result of expression .

That syntax isn't intuitive, so the shell provides a better equivalent: the built-in command let . The syntax is:

let 
intvar
=
expression

It is not necessary (because it's actually redundant) to surround the expression with $(( and )) in a let statement. As with any variable assignment, there must not be any space on either side of the equal sign ( = ). It is good practice to surround expressions with quotes, since many characters are treated as special by the shell (e.g., * , # , and parentheses); furthermore, you must quote expressions that include whitespace (spaces or TABs). See Table 6.4 for examples.

Table 6.4: Sample Integer Expression Assignments
Assignment	Value
let x=	$x
1+4	5
`'` 1 + 4`'`	5
`'` (2+3) * 5`'`	25
`'` 2 + 3 * 5`'`	17
`'` 17 / 3`'`	5
`'` 17 % 3`'`	2
`'` 1<<4`'`	16
`'` 48>>3`'`	6
`'` 17 & 3`'`	1
`'` 17 \| 3`'`	19
`'` 17 ^ 3`'`	18

Here is a small task that makes use of integer arithmetic.

Task 6.1

Write a script called pages that, given the name of a text file, tells how many pages of output it contains. Assume that there are 66 lines to a page but provide an option allowing the user to override that.

We'll make our option - N , a la head . The syntax for this single option is so simple that we need not bother with getopts . Here is the code:

if [[ $1 = -+([0-9]) ]]; then
    let page_lines=${1#-}
    shift
else
    let page_lines=66
fi
let file_lines="$(wc -l < $1)"

let pages=file_lines/page_lines
if (( file_lines % page_lines > 0 )); then
    let pages=pages+1
fi

print "$1 has $pages pages of text."

Notice that we use the integer conditional (( file_lines % page_lines > 0 )) rather than the [[ ... ]] form.

At the heart of this code is the UNIX utility wc(1) , which counts the number of lines, words, and characters (bytes) in its input. By default, its output looks something like this:

8      34     161  bob

wc 's output means that the file bob has 8 lines, 34 words, and 161 characters. wc recognizes the options -l , -w , and -c , which tell it to print only the number of lines, words, or characters, respectively.

wc normally prints the name of its input file (given as argument). Since we want only the number of lines, we have to do two things. First, we give it input from file redirection instead, as in wc -l < bob instead of wc -l bob . This produces the number of lines preceded by a single space (which would normally separate the filename from the number).

Unfortunately, that space complicates matters: the statement let file_lines=$(wc -l < $1) becomes "let file_lines= N " after command substitution; the space after the equal sign is an error. That leads to the second modification, the quotes around the command substitution expression. The statement let file_lines=" N " is perfectly legal, and let knows how to remove the leading space.

The first if clause in the pages script checks for an option and, if it was given, strips the dash ( - ) off and assigns it to the variable page_lines . wc in the command substitution expression returns the number of lines in the file whose name is given as argument.

The next group of lines calculates the number of pages and, if there is a remainder after the division, adds 1. Finally, the appropriate message is printed.

As a bigger example of integer arithmetic, we will complete our emulation of the C shell's pushd and popd functions (Task 4-8). Remember that these functions operate on DIRSTACK , a stack of directories represented as a string with the directory names separated by spaces. The C shell's pushd and popd take additional types of arguments, which are:

pushd +n takes the n th directory in the stack (starting with 0), rotates it to the top, and cd s to it.
pushd without arguments, instead of complaining, swaps the two top directories on the stack and cd s to the new top.
popd +n takes the n th directory in the stack and just deletes it.

The most useful of these features is the ability to get at the n th directory in the stack. Here are the latest versions of both functions:

function pushd { # push current directory onto stack
    dirname=$1
    if [[ -d $dirname && -x $dirname ]]; then
  	  cd $dirname
        DIRSTACK="$dirname ${DIRSTACK:-$PWD}"
        print "$DIRSTACK"
    else
        print "still in $PWD."
    fi
}

function popd {  # pop directory off the stack, cd to new top
    if [[ -n $DIRSTACK ]]; then
        DIRSTACK=${DIRSTACK#* }
        cd ${DIRSTACK%% *}
        print "$PWD"
    else
        print "stack empty, still in $PWD."
    fi
}

To get at the n th directory, we use a while loop that transfers the top directory to a temporary copy of the stack n times. We'll put the loop into a function called getNdirs that looks like this:

function getNdirs{
    stackfront=''
    let count=0
    while (( count < $1 )); do
        stackfront="$stackfront ${DIRSTACK%% *}"
        DIRSTACK=${DIRSTACK#* }
        let count=count+1
    done
}

The argument passed to getNdirs is the n in question. The variable stackfront is the temporary copy that will contain the first n directories when the loop is done. stackfront starts as null; count , which counts the number of loop iterations, starts as 0.

The first line of the loop body appends the top of the stack ( ${DIRSTACK%% * } ) to stackfront ; the second line deletes the top from the stack. The last line increments the counter for the next iteration. The entire loop executes N times, for values of count from 0 to N -1.

When the loop finishes, the last directory in $stackfront is the N th directory. The expression ${stackfront## * } extracts this directory. Furthermore, DIRSTACK now contains the "back" of the stack, i.e., the stack without the first n directories. With this in mind, we can now write the code for the improved versions of pushd and popd :

function pushd {
    if [[ $1 = ++([0-9]) ]]; then
        # case of pushd +n: rotate n-th directory to top
        let num=${1#+}
        getNdirs $num

        newtop=${stackfront##* }
        stackfront=${stackfront%$newtop}

        DIRSTACK="$newtop $stackfront $DIRSTACK"
        cd $newtop

    elif [[ -z $1 ]]; then
        # case of pushd without args; swap top two directories
        firstdir=${DIRSTACK%% *}
        DIRSTACK=${DIRSTACK#* }
        seconddir=${DIRSTACK%% *}
        DIRSTACK=${DIRSTACK#* } 
        DIRSTACK="$seconddir $firstdir $DIRSTACK"
        cd $seconddir

    else
  	  cd $dirname
        # normal case of pushd dirname
        dirname=$1
        if [[ -d $dirname && -x $dirname ]]; then
            DIRSTACK="$dirname ${DIRSTACK:-$PWD}"
            print "$DIRSTACK"
        else
            print still in "$PWD."
        fi
    fi
}

function popd {      # pop directory off the stack, cd to new top
    if [[ $1 = ++([0-9]) ]]; then
        # case of popd +n: delete n-th directory from stack
        let num={$1#+}
        getNdirs $num
        stackfront=${stackfront% *}
        DIRSTACK="$stackfront $DIRSTACK"

    else
        # normal case of popd without argument
        if [[ -n $DIRSTACK ]]; then
            DIRSTACK=${DIRSTACK#* }
            cd ${DIRSTACK%% *}
            print "$PWD"
        else
            print "stack empty, still in $PWD."
        fi
    fi
}

These functions have grown rather large; let's look at them in turn. The if at the beginning of pushd checks if the first argument is an option of the form + N . If so, the first body of code is run. The first let simply strips the plus sign (+) from the argument and assigns the result - as an integer - to the variable num . This, in turn, is passed to the getNdirs function.

The next two assignment statements set newtop to the N th directory - i.e., the last directory in $stackfront - and delete that directory from stackfront . The final two lines in this part of pushd put the stack back together again in the appropriate order and cd to the new top directory.

The elif clause tests for no argument, in which case pushd should swap the top two directories on the stack. The first four lines of this clause assign the top two directories to firstdir and seconddir , and delete these from the stack. Then, as above, the code puts the stack back together in the new order and cd s to the new top directory.

The else clause corresponds to the usual case, where the user supplies a directory name as argument.

popd works similarly. The if clause checks for the + N option, which in this case means delete the N th directory. A let extracts the N as an integer; the getNdirs function puts the first n directories into stackfront . Then the line stackfront=${stackfront% *} deletes the last directory (the N th directory) from stackfront . Finally, the stack is put back together with the N th directory missing.

The else clause covers the usual case, where the user doesn't supply an argument.

Before we leave this subject, here are a few exercises that should test your understanding of this code:

Add code to pushd that exits with an error message if the user supplies no argument and the stack contains fewer than two directories.
Verify that when the user specifies + N and N exceeds the number of directories in the stack, both pushd and popd use the last directory as the N th directory.
Modify the getNdirs function so that it checks for the above condition and exits with an appropriate error message if true.
Change getNdirs so that it uses cut (with command substitution), instead of the while loop, to extract the first N directories. This uses less code but runs more slowly because of the extra processes generated.