Loop Statements (Programming Perl)

4.4. Loop Statements

All loop statements have an optional LABEL in their formal syntax. (You can put a label on any statement, but it has a special meaning to a loop.) If present, the label consists of an identifier followed by a colon. It's customary to make the label uppercase to avoid potential confusion with reserved words, and so it stands out better. And although Perl won't get confused if you use a label that already has a meaning like if or open, your readers might.

4.4.1. while and until Statements

The while statement repeatedly executes the block as long as EXPR is true. If the word while is replaced by the word until, the sense of the test is reversed; that is, it executes the block only as long as EXPR remains false. The conditional is still tested before the first iteration, though.

The while or until statement can have an optional extra block: the continue block. This block is executed every time the block is continued, either by falling off the end of the first block or by an explicit next (a loop-control operator that goes to the next iteration). The continue block is not heavily used in practice, but it's in here so we can define the for loop rigorously in the next section.

Unlike the foreach loop we'll see in a moment, a while loop never implicitly localizes any variables in its test condition. This can have "interesting" consequences when while loops use globals for loop variables. In particular, see the section Section 4.11.2, "Line Input (Angle) Operator" in Chapter 2, "Bits and Pieces" for how implicit assignment to the global $_ can occur in certain while loops, along with an example of how to deal with the problem by explicitly localizing $_. For other loop variables, however, it's best to declare them with my, as in the next example.

A variable declared in the test condition of a while or until statement is visible only in the block or blocks governed by that test. It is not part of the surrounding scope. For example:

while (my $line = <STDIN>) {
    $line = lc $line;
}
continue {
    print $line;   # still visible
}
# $line now out of scope here

Here the scope of $line extends from its declaration in the control expression throughout the rest of the loop construct, including the continue block, but not beyond. If you want the scope to extend further, declare the variable before the loop.

4.4.2. for Loops

The three-part for loop has three semicolon-separated expressions within its parentheses. These expressions function respectively as the initialization, the condition, and the re-initialization expressions of the loop. All three expressions are optional (but not the semicolons); if omitted, the condition is always true. Thus, the three-part for loop can be defined in terms of the corresponding while loop. This:

LABEL:
  for (my $i = 1; $i <= 10; $i++) {
      ...
  }

is like this:

{
    my $i = 1;
  LABEL:
    while ($i <= 10) {
        ...
    }

    continue {
        $i++;
    }
}

except that there's not really an outer block. (We just put one there to show how the scope of the my is limited.)

If you want to iterate through two variables simultaneously, just separate the parallel expressions with commas:

for ($i = 0, $bit = 0; $i < 32; $i++, $bit <<= 1) {
    print "Bit $i is set\n" if $mask & $bit;
}
# the values in $i and $bit persist past the loop

Or declare those variables to be visible only inside the for loop:

for (my ($i, $bit) = (0, 1); $i < 32; $i++, $bit <<= 1) {
    print "Bit $i is set\n" if $mask & $bit;
}
# loop's versions of $i and $bit now out of scope

Besides the normal looping through array indices, for can lend itself to many other interesting applications. It doesn't even need an explicit loop variable. Here's one example that avoids the problem you get when you explicitly test for end-of-file on an interactive file descriptor, causing your program to appear to hang.

$on_a_tty = -t STDIN && -t STDOUT;
sub prompt { print "yes? " if $on_a_tty }
for ( prompt(); <STDIN>; prompt() ) {
    # do something
}

Another traditional application for the three-part for loop results from the fact that all three expressions are optional, and the default condition is true. If you leave out all three expressions, you have written an infinite loop:

for (;;) {
    ...
}

This is the same as writing:

while (1) {
    ...
}

If the notion of infinite loops bothers you, we should point out that you can always fall out of the loop at any point with an explicit loop-control operator such as last. Of course, if you're writing the code to control a cruise missile, you may not actually need an explicit loop exit. The loop will be terminated automatically at the appropriate moment.[3]

[3] That is, the fallout from the loop tends to occur automatically.

4.4.3. foreach Loops

The foreach loop iterates over a list of values by setting the control variable (VAR) to each successive element of the list:

foreach VAR (LIST) {
    ...
}

The foreach keyword is just a synonym for the for keyword, so you can use foreach and for interchangeably, whichever you think is more readable in a given situation. If VAR is omitted, the global $_ is used. (Don't worry--Perl can easily distinguish for (@ARGV) from for ($i=0; $i<$#ARGV; $i++) because the latter contains semicolons.) Here are some examples:

$sum = 0; foreach $value (@array) { $sum += $value }

for $count (10,9,8,7,6,5,4,3,2,1,'BOOM') {  # do a countdown
    print "$count\n"; sleep(1);
}

for (reverse 'BOOM', 1 .. 10) {             # same thing
    print "$_\n"; sleep(1);
}

for $field (split /:/, $data) {             # any LIST expression
    print "Field contains: `$field'\n";
}

foreach $key (sort keys %hash) {
    print "$key => $hash{$key}\n";
}

That last one is the canonical way to print out the values of a hash in sorted order. See the keys and sort entries in Chapter 29, "Functions" for more elaborate examples.

There is no way with foreach to tell where you are in a list. You may compare adjacent elements by remembering the previous one in a variable, but sometimes you just have to break down and write a three-part for loop with subscripts. That's what the other kind of for loop is there for, after all.

If LIST consists entirely of assignable values (meaning variables, generally, not enumerated constants), you can modify each of those variables by modifying VAR inside the loop. That's because the foreach loop index variable is an implicit alias for each item in the list that you're looping over. Not only can you modify a single array in place, you can also modify multiple arrays and hashes in a single list:

foreach $pay (@salaries) {               # grant 8% raises
    $pay *= 1.08;
}

for (@christmas, @easter) {              # change menu
    s/ham/turkey/;
}
s/ham/turkey/ for @christmas, @easter;   # same thing

for ($scalar, @array, values %hash) {
    s/^\s+//;                            # strip leading  whitespace
    s/\s+$//;                            # strip trailing whitespace
}

The loop variable is valid only from within the dynamic or lexical scope of the loop and will be implicitly lexical if the variable was previously declared with my. This renders it invisible to any function defined outside the lexical scope of the variable, even if called from within that loop. However, if no lexical declaration is in scope, the loop variable will be a localized (dynamically scoped) global variable; this allows functions called from within the loop to access that variable. In either case, any previous value the localized variable had before the loop will be restored automatically upon loop exit.

If you prefer, you may explicitly declare which kind of variable (lexical or global) to use. This makes it easier for maintainers of your code to know what's really going on; otherwise, they'll need to search back up through enclosing scopes for a previous declaration to figure out which kind of variable it is:

for my  $i    (1 .. 10) { ... }         # $i always lexical
for our $Tick (1 .. 10) { ... }         # $Tick always global

When a declaration accompanies the loop variable, the shorter for spelling is preferred over foreach, since it reads better in English.

Here's how a C or Java programmer might first think to code up a particular algorithm in Perl:

for ($i = 0; $i < @ary1; $i++) {
    for ($j = 0; $j < @ary2; $j++) {
        if ($ary1[$i] > $ary2[$j]) {
            last;         # Can't go to outer loop. :-(
        }
        $ary1[$i] += $ary2[$j];
    }
    # this is where that last takes me
}

But here's how a veteran Perl programmer might do it:

WID: foreach $this (@ary1) {
    JET: foreach $that (@ary2) {
        next WID if $this > $that;
        $this += $that;
    }
}

See how much easier that was in idiomatic Perl? It's cleaner, safer, and faster. It's cleaner because it's less noisy. It's safer because if code gets added between the inner and outer loops later on, the new code won't be accidentally executed, since next (explained below) explicitly iterates the outer loop rather than merely breaking out of the inner one. And it's faster because Perl executes a foreach statement more rapidly than it would the equivalent for loop, since the elements are accessed directly instead of through subscripting.

But write it however you like. TMTOWTDI.

Like the while statement, the foreach statement can also take a continue block. This lets you execute a bit of code at the bottom of each loop iteration no matter whether you got there in the normal course of events or through a next.

Speaking of which, now we can finally say it: next is next.

4.4.4. Loop Control

We mentioned that you can put a LABEL on a loop to give it a name. The loop's LABEL identifies the loop for the loop-control operators next, last, and redo. The LABEL names the loop as a whole, not just the top of the loop. Hence, a loop-control operator referring to the loop doesn't actually "go to" the loop label itself. As far as the computer is concerned, the label could just as easily have been placed at the end of the loop. But people like things labeled at the top, for some reason.

Loops are typically named for the item the loop is processing on each iteration. This interacts nicely with the loop-control operators, which are designed to read like English when used with an appropriate label and a statement modifier. The archetypal loop works on lines, so the archetypal loop label is LINE:, and the archetypal loop-control operator is something like this:

next LINE if /^#/;      # discard comments

The syntax for the loop-control operators is:

last LABEL
next LABEL
redo LABEL

The LABEL is optional; if omitted, the operator refers to the innermost enclosing loop. But if you want to jump past more than one level, you must use a LABEL to name the loop you want to affect. That LABEL does not have to be in your lexical scope, though it probably ought to be. But in fact, the LABEL can be anywhere in your dynamic scope. If this forces you to jump out of an eval or subroutine, Perl issues a warning (upon request).

Just as you may have as many return operators in a function as you like, you may have as many loop-control operators in a loop as you like. This is not to be considered wicked or even uncool. During the early days of structured programming, some people insisted that loops and subroutines have only one entry and one exit. The one-entry notion is still a good idea, but the one-exit notion has led people to write a lot of unnatural code. Much of programming consists of traversing decision trees. A decision tree naturally starts with a single trunk but ends with many leaves. Write your code with the number of loop exits (and function returns) that is natural to the problem you're trying to solve. If you've declared your variables with reasonable scopes, everything gets automatically cleaned up at the appropriate moment, no matter how you leave the block.

The last operator immediately exits the loop in question. The continue block, if any, is not executed. The following example bombs out of the loop on the first blank line:

LINE: while (<STDIN>) {
    last LINE if /^$/;      # exit when done with mail header
    ...
}

The next operator skips the rest of the current iteration of the loop and starts the next one. If there is a continue clause on the loop, it is executed just before the condition is re-evaluated, just like the third component of a three-part for loop. Thus it can be used to increment a loop variable, even when a particular iteration of the loop has been interrupted by a next:

LINE: while (<STDIN>) {
    next LINE if /^#/;      # skip comments
    next LINE if /^$/;      # skip blank lines
    ...
} continue {
    $count++;
}

The redo operator restarts the loop block without evaluating the conditional again. The continue block, if any, is not executed. This operator is often used by programs that want to fib to themselves about what was just input. Suppose you were processing a file that sometimes had a backslash at the end of a line to continue the record on the next line. Here's how you could use redo for that:

while (<>) {
    chomp;
    if (s/\\$//) {
        $_ .= <>;
        redo unless eof;    # don't read past each file's eof
    }
    # now process $_
}

which is the customary Perl shorthand for the more explicitly (and tediously) written version:

LINE: while (defined($line = <ARGV>)) {
    chomp($line);
    if ($line =~ s/\\$//) {
        $line .= <ARGV>;
        redo LINE unless eof(ARGV);
    }
    # now process $line
}

Here's an example from a real program that uses all three loop-control operators. Although this particular strategy of parsing command-line arguments is less common now that we have the Getopts::* modules bundled with Perl, it's still a nice illustration of the use of loop-control operators on named, nested loops:

ARG: while (@ARGV && $ARGV[0] =~ s/^-(?=.)//) {
    OPT: for (shift @ARGV) {
         m/^$/       && do {                             next ARG; };
         m/^-$/      && do {                             last ARG; };
         s/^d//      && do { $Debug_Level++;             redo OPT; };
         s/^l//      && do { $Generate_Listing++;        redo OPT; };
         s/^i(.*)//  && do { $In_Place = $1 || ".bak";   next ARG; };
         say_usage("Unknown option: $_");
    }
}

One more point about loop-control operators. You may have noticed that we are not calling them "statements". That's because they aren't statements--although like any expression, they can be used as statements. You can almost think of them as unary operators that just happen to cause a change in control flow. So you can use them anywhere it makes sense to use them in an expression. In fact, you can even use them where it doesn't make sense. One sometimes sees this coding error:

open FILE, $file
     or warn "Can't open $file: $!\n", next FILE;   # WRONG

The intent is fine, but the next FILE is being parsed as one of the arguments to warn, which is a list operator. So the next executes before the warn gets a chance to emit the warning. In this case, it's easily fixed by turning the warn list operator into the warn function call with some suitably situated parentheses:

open FILE, $file
     or warn("Can't open $file: $!\n"), next FILE;   # okay

However, you might find it easier to read this:

unless (open FILE, $file) {
     warn "Can't open $file: $!\n";
     next FILE;
}