Control Structures (Programming Perl)

1.6. Control Structures

So far, except for our one large example, all of our examples have been completely linear; we executed each command in order. We've seen a few examples of using the short-circuit operators to cause a single command to be (or not to be) executed. While you can write some very useful linear programs (a lot of CGI scripts fall into this category), you can write much more powerful programs if you have conditional expressions and looping mechanisms. Collectively, these are known as control structures. So you can also think of Perl as a control language.

But to have control, you have to be able to decide things, and to decide things, you have to know the difference between what's true and what's false.

1.6.1. What Is Truth?

We've bandied about the term truth,[20] and we've mentioned that certain operators return a true or a false value. Before we go any further, we really ought to explain exactly what we mean by that. Perl treats truth a little differently than most computer languages, but after you've worked with it a while, it will make a lot of sense. (Actually, we hope it'll make a lot of sense after you've read the following.)

[20] Strictly speaking, this is not true.

Basically, Perl holds truths to be self-evident. That's a glib way of saying that you can evaluate almost anything for its truth value. Perl uses practical definitions of truth that depend on the type of thing you're evaluating. As it happens, there are many more kinds of truth than there are of nontruth.

Truth in Perl is always evaluated in a scalar context. Other than that, no type coercion is done. So here are the rules for the various kinds of values a scalar can hold:

Any string is true except for "" and "0".
Any number is true except for 0.
Any reference is true.
Any undefined value is false.

Actually, the last two rules can be derived from the first two. Any reference (rule 3) would point to something with an address and would evaluate to a number or string containing that address, which is never 0 because it's always defined. And any undefined value (rule 4) would always evaluate to 0 or the null string.

And in a way, you can derive rule 2 from rule 1 if you pretend that everything is a string. Again, no string coercion is actually done to evaluate truth, but if the string coercion were done, then any numeric value of 0 would simply turn into the string "0" and be false. Any other number would not turn into the string "0", and so would be true. Let's look at some examples so we can understand this better:

0          # would become the string "0", so false.
1          # would become the string "1", so true.
10 - 10    # 10 minus 10 is 0, would convert to string "0", so false.
0.00       # equals 0, would convert to string "0", so false.
"0"        # is the string "0", so false.
""         # is a null string, so false.
"0.00"     # is the string "0.00", neither "" nor "0", so true!
"0.00" + 0 # would become the number 0 (coerced by the +), so false.
\$a        # is a reference to $a, so true, even if $a is false.
undef()    # is a function returning the undefined value, so false.

Since we mumbled something earlier about truth being evaluated in a scalar context, you might be wondering what the truth value of a list is. Well, the simple fact is, none of the operations in Perl will return a list in a scalar context. They'll all notice they're in a scalar context and return a scalar value instead, and then you apply the rules of truth to that scalar. So there's no problem, as long as you can figure out what any given operator will return in a scalar context. As it happens, both arrays and hashes return scalar values that conveniently happen to be true if the array or hash contains any elements. More on that later.

1.6.1.1. The if and unless statements

We saw earlier how a logical operator could function as a conditional. A slightly more complex form of the logical operators is the if statement. The if statement evaluates a truth condition (that is, a Boolean expression) and executes a block if the condition is true:

if ($debug_level > 0) {
    # Something has gone wrong.  Tell the user.
    print "Debug: Danger, Will Robinson, danger!\n";
    print "Debug: Answer was '54', expected '42'.\n";
}

A block is one or more statements grouped together by a set of braces. Since the if statement executes a block, the braces are required by definition. If you know a language like C, you'll notice that this is different. Braces are optional in C if you have a single statement, but the braces are not optional in Perl.

Sometimes, just executing a block when a condition is met isn't enough. You may also want to execute a different block if that condition isn't met. While you could certainly use two if statements, one the negation of the other, Perl provides a more elegant solution. After the block, if can take an optional second condition, called else, to be executed only if the truth condition is false. (Veteran computer programmers will not be surprised at this point.)

At times you may even have more than two possible choices. In this case, you'll want to add an elsif truth condition for the other possible choices. (Veteran computer programmers may well be surprised by the spelling of "elsif", for which nobody here is going to apologize. Sorry.)

if ($city eq "New York") {
    print "New York is northeast of Washington, D.C.\n";
}
elsif ($city eq "Chicago") {
    print "Chicago is northwest of Washington, D.C.\n";
}
elsif ($city eq "Miami") {
    print "Miami is south of Washington, D.C.  And much warmer!\n";
}
else {
    print "I don't know where $city is, sorry.\n";
}

The if and elsif clauses are each computed in turn, until one is found to be true or the else condition is reached. When one of the conditions is found to be true, its block is executed and all remaining branches are skipped. Sometimes, you don't want to do anything if the condition is true, only if it is false. Using an empty if with an else may be messy, and a negated if may be illegible; it sounds weird in English to say "if not this is true, do something". In these situations, you would use the unless statement:

unless ($destination eq $home) {
    print "I'm not going home.\n";
}

There is no elsunless though. This is generally construed as a feature.

1.6.2. Iterative (Looping) Constructs

Perl has four main iterative statement types: while, until, for, and foreach. These statements allow a Perl program to repeatedly execute the same code.

1.6.2.1. The while and until statements

The while and until statements behave just like the if and unless statements, except that they'll execute the block repeatedly. That is, they loop. First, the conditional part of the statement is checked. If the condition is met (if it is true for a while or false for an until), the block of the statement is executed.

while ($tickets_sold < 10000) {
    $available = 10000 - $tickets_sold;
    print "$available tickets are available.  How many would you like: ";
    $purchase = <STDIN>;
    chomp($purchase);
    $tickets_sold += $purchase;
}

Note that if the original condition is never met, the loop will never be entered at all. For example, if we've already sold 10,000 tickets, we might want to have the next line of the program say something like:

print "This show is sold out, please come back later.\n";

In our Average Example earlier, line 4 reads:

while ($line = <GRADES>) {

This assigns the next line to the variable $line and, as we explained earlier, returns the value of $line so that the condition of the while statement can evaluate $line for truth. You might wonder whether Perl will get a false negative on blank lines and exit the loop prematurely. The answer is that it won't. The reason is clear if you think about everything we've said. The line input operator leaves the newline on the end of the string, so a blank line has the value "\n". And you know that "\n" is not one of the canonical false values. So the condition is true, and the loop continues even on blank lines.

On the other hand, when we finally do reach the end of the file, the line input operator returns the undefined value, which always evaluates to false. And the loop terminates, just when we wanted it to. There's no need for an explicit test of the eof function in Perl, because the input operators are designed to work smoothly in a conditional context.

In fact, almost everything is designed to work smoothly in a conditional (Boolean) context. If you mention an array in a scalar context, the length of the array is returned. So you often see command-line arguments processed like this:

while (@ARGV) {
    process(shift @ARGV);
}

The shift operator removes one element from the argument list each time through the loop (and returns that element). The loop automatically exits when array @ARGV is exhausted, that is, when its length goes to 0. And 0 is already false in Perl. In a sense, the array itself has become "false".[21]

[21] This is how Perl programmers think. So there's no need to compare 0 to 0 to see if it's false. Despite the fact that other languages force you to, don't go out of your way to write explicit comparisons like while (@ARGV != 0). That's just inefficient for both you and the computer. And anyone who has to maintain your code.

1.6.2.2. The for statement

Another iterative statement is the for loop. The for loop runs exactly like the while loop, but looks a good deal different. (C programmers will find it very familiar though.)

for ($sold = 0; $sold < 10000; $sold += $purchase) {
    $available = 10000 - $sold;
    print "$available tickets are available.  How many would you like: ";
    $purchase = <STDIN>;
    chomp($purchase);
}

This for loop takes three expressions within the loop's parentheses: an expression to set the initial state of the loop variable, a condition to test the loop variable, and an expression to modify the state of the loop variable. When a for loop starts, the initial state is set and the truth condition is checked. If the condition is true, the block is executed. When the block finishes, the modification expression is executed, the truth condition is again checked, and if true, the block is rerun with the next value. As long as the truth condition remains true, the block and the modification expression will continue to be executed. (Note that only the middle expression is evaluated for its value. The first and third expressions are evaluated only for their side effects, and the resulting values are thrown away!)

1.6.2.3. The foreach statement

The last of Perl's iterative statements is the foreach statement, which is used to execute the same code for each of a known set of scalars, such as an array:

foreach $user (@users) {
    if (-f "$home{$user}/.nexrc") {
        print "$user is cool... they use a perl-aware vi!\n";
    }
}

Unlike the if and while statements, which provide scalar context to a conditional expression, the foreach statement provides a list context to the expression in parentheses. So the expression is evaluated to produce a list (not a scalar, even if there's only one scalar in the list). Then each element of the list is aliased to the loop variable in turn, and the block of code is executed once for each list element. Note that the loop variable refers to the element itself, rather than a copy of the element. Hence, modifying the loop variable also modifies the original array.

You'll find many more foreach loops in the typical Perl program than for loops, because it's very easy in Perl to generate the kinds of lists that foreach wants to iterate over. One idiom you'll often see is a loop to iterate over the sorted keys of a hash:

foreach $key (sort keys %hash) {

In fact, line 9 of our Average Example does precisely that.

1.6.2.4. Breaking out: next and last

The next and last operators allow you to modify the flow of your loop. It is not at all uncommon to have a special case; you may want to skip it, or you may want to quit when you encounter it. For example, if you are dealing with Unix accounts, you may want to skip the system accounts (like root or lp). The next operator would allow you to skip to the end of your current loop iteration, and start the next iteration. The last operator would allow you to skip to the end of your block, as if your loop's test condition had returned false. This might be useful if, for example, you are looking for a specific account and want to quit as soon as you find it.

foreach $user (@users) {
    if ($user eq "root" or $user eq "lp") {
        next;
    }
    if ($user eq "special") {
        print "Found the special account.\n";
        # do some processing
        last;
    }
}

It's possible to break out of multilevel loops by labeling your loops and specifying which loop you want to break out of. Together with statement modifiers (another form of conditional which we'll talk about later), this can make for extremely readable loop exits (if you happen to think English is readable):

LINE: while ($line = <ARTICLE>) {
    last LINE if $line eq "\n"; # stop on first blank line
    next LINE if $line =~ /^#/; # skip comment lines
    # your ad here
}

You may be saying, "Wait a minute, what's that funny ^# thing there inside the leaning toothpicks? That doesn't look much like English." And you're right. That's a pattern match containing a regular expression (albeit a rather simple one). And that's what the next section is about. Perl is the best text processing language in the world, and regular expressions are at the heart of Perl's text processing.