18.2. Compiling Your CodePerl is always in one of two modes of operation: either it is compiling your program, or it is executing it--never both at the same time. Throughout this book, we refer to certain events as happening at compile time, or we say that "the Perl compiler does this and that". At other points, we mention that something else occurs at run time, or that "the Perl interpreter does this and that". Although you can get by with thinking of both the compiler and interpreter as simply "Perl", understanding which of these two roles Perl is playing at any given point is essential to understanding why many things happen as they do. The perl executable implements both roles: first the compiler, then the interpreter. (Other roles are possible, too; perl is also an optimizer and a code generator. Occasionally, it's even a trickster--but all in good fun.) It's also important to understand the distinction between compile phase and compile time, and between run phase and run time. A typical Perl program gets one compile phase, and then one run phase. A "phase" is a large-scale concept. But compile time and run time are small-scale concepts. A given compile phase does mostly compile-time stuff, but it also does some run-time stuff via BEGIN blocks. A given run phase does mostly run-time stuff, but it can do compile-time stuff through operators like evalSTRING. In the typical course of events, the Perl compiler reads through your entire program source before execution starts. This is when Perl parses the declarations, statements, and expressions to make sure they're syntactically legal.[3] If it finds a syntax error, the compiler attempts to recover from the error so it can report any other errors later in the source. Sometimes this works, and sometimes it doesn't; syntax errors have a noisy tendency to trigger a cascade of false alarms. Perl bails out in frustration after about 10 errors.
In addition to the interpreter that processes the BEGIN blocks, the compiler processes your program with the connivance of three notional agents. The lexer scans for each minimal unit of meaning in your program. These are sometimes called "lexemes", but you'll more often hear them referred to as tokens in texts about programming languages. The lexer is sometimes called a tokener or a scanner, and what it does is sometimes called lexing or tokenizing. The parser then tries to make sense out of groups of these tokens by assembling them into larger constructs, such as expressions and statements, based on the grammar of the Perl language. The optimizer rearranges and reduces these larger groupings into more efficient sequences. It picks its optimizations carefully, not wasting time on marginal optimizations, because the Perl compiler has to be blazing fast when used as a load-and-go compiler. This doesn't happen in independent stages, but all at once with a lot of cross talk between the agents. The lexer occasionally needs hints from the parser to know which of several possible token types it's looking at. (Oddly, lexical scope is one of the things the lexical analyzer doesn't understand, because that's the other meaning of "lexical".) The optimizer also needs to keep track of what the parser is doing, because some optimizations can't happen until the parse has reached a certain point, like finishing an expression, statement, block, or subroutine. You may think it odd that the Perl compiler does all these things at once instead of one after another, but it's really just the same messy process you go through to understand natural language on the fly, while you're listening to it or reading it. You don't wait till the end of a chapter to figure out what the first sentence meant. You could think of the following correspondences:
Assuming the parse goes well, the compiler deems your input a valid story, er, program. If you use the -c switch when running your program, it prints out a "syntax OK" message and exits. Otherwise, the compiler passes the fruits of its efforts on to other agents. These "fruits" come in the form of a parse tree. Each fruit on the tree--or node, as it's called--represents one of Perl's internal opcodes, and the branches on the tree represent that tree's historical growth pattern. Eventually, the nodes will be strung together linearly, one after another, to indicate the execution order in which the run-time system will visit those nodes. Each opcode is the smallest unit of executable instruction that Perl can think about. You might see an expression like $a = -($b + $c) as one statement, but Perl thinks of it as six separate opcodes. Laid out in a simplified format, the parse tree for that expression would look like Figure 18-2. The numbers represent the visitation order that the Perl run-time system will eventually follow. Figure 18.2. Opcode visitation order of $a = -($b + $c)Perl isn't a one-pass compiler as some might imagine. (One-pass compilers are great at making things easy for the computer and hard for the programmer.) It's really a multipass, optimizing compiler consisting of at least three different logical passes that are interleaved in practice. Passes 1 and 2 run alternately as the compiler repeatedly scurries up and down the parse tree during its construction; pass 3 happens whenever a subroutine or file is completely parsed. Here are those passes:
During compilation, Perl optimizes your code in many, many ways. It rearranges code to make it more efficient at execution time. It deletes code that can never be reached during execution, like an if (0) block, or the elsifs and the else in an if (1) block. If you use lexically typed variables declared with my ClassName $var or our ClassName $var, and the ClassName package was set up with the use fields pragma, accesses to constant fields from the underlying pseudohash are typo-checked at compile time and converted into array accesses instead. If you supply the sort operator with a simple enough comparison routine, such as {$a <=> $b} or {$b cmp $a}, this is replaced by a call to compiled C code. Perl's most dramatic optimization is probably the way it resolves constant expressions as soon as possible. For example, consider the parse tree shown in Figure 18-2. If nodes 1 and 2 had both been literals or constant functions, nodes 1 through 4 would have been replaced by the result of that computation, something like Figure 18-3. Figure 18.3. Constant foldingThis is called constant folding. Constant folding isn't limited to simple cases such as turning 2**10 into 1024 at compile time. It also resolves function calls--both built-ins and user-declared subroutines that meet the criteria from the section Section 18.4.1, "Inlining Constant Functions" in Chapter 6, "Subroutines". Reminiscent of FORTRAN compilers' notorious knowledge of their own intrinsic functions, Perl also knows which of its own built-ins to call during compilation. That's why if you try to take the log of 0.0 or the sqrt of a negative constant, you'll incur a compilation error, not a run-time error, and the interpreter is never run at all.[4]
Even arbitrarily complicated expressions are resolved early, sometimes triggering the deletion of complete blocks such as the one here: No code is generated for what can never be evaluated. Because the first part is always false, neither somefn nor whatever can ever be called. (So don't expect to goto labels inside that block, because it won't even exist at run time.) If somefn were an inlinable constant function, then even switching the evaluation order like this:if (2 * sin(1)/cos(1) < 3 && somefn()) { whatever() } wouldn't change the outcome, since the entire expression still resolves at compile time. If whatever were inlinable, it wouldn't be called at run time, nor even during compilation; its value would just be inserted as though it were a literal constant. You would then incur a warning about a "Useless use of a constant in void context". This might surprise you if you didn't realize it was a constant. However, if whatever were the last statement evaluated in a function called in a nonvoid context (as determined by the optimizer), you wouldn't see the warning.if (somefn() && 2 * sin(1)/cos(1) < 3)) { whatever() } You can see the final result of the constructed parse tree after all optimization stages with perl -Dx. (The -D switch requires a special, debugging-enabled build of Perl). Also see the section on B::Deparse described below. All in all, the Perl compiler works hard (but not too hard) to optimize code so that, come run time, overall execution is sped up. It's about time to get your program running, so let's do that now. Copyright © 2001 O'Reilly & Associates. All rights reserved. |
||||||||||||||||||
|