Avant-Garde Compiler, Retro Interpreter (Programming Perl)

18.7. Avant-Garde Compiler, Retro Interpreter

There's a right time to think about everything; sometimes that time is beforehand, and sometimes it's after. Sometimes it's somewhere in the middle. Perl doesn't presume to know when it's the right time to think, so it gives the programmer a number of options for telling it when to think. Other times it knows that some sort of thinking is necessary but doesn't have any idea what it ought to think, so it needs ways of asking your program. Your program answers these kinds of questions by defining subroutines with names appropriate to what Perl is trying to find out.

Not only can the compiler call into the interpreter when it wants to be forward thinking, but the interpreter can also call back to the compiler when it wants to revise history. Your program can use several operators to call back into the compiler. Like the compiler, the interpreter can also call into named subroutines when it wants to find things out. Because of all this give and take between the compiler, the interpreter, and your program, you need to be aware of what things happen when. First we'll talk about when these named subroutines are triggered.

In Chapter 10, "Packages", we saw how a package's AUTOLOAD subroutine is triggered when an undefined function in that package is called. In Chapter 12, "Objects", we met the DESTROY method which is invoked when an object's memory is about to be automatically reclaimed by Perl. And in Chapter 14, "Tied Variables", we encountered the many functions implicitly called when a tied variable is accessed.

These subroutines all follow the convention that, if a subroutine is triggered automatically by either the compiler or the interpreter, we write its name in uppercase. Associated with the different stages of your program's lifetime are four other such subroutines, named BEGIN, CHECK, INIT, and END. The sub keyword is optional before their declarations. Perhaps they are better called "blocks", because they're in some ways more like named blocks than real subroutines.

For instance, unlike regular subroutines, there's no harm in declaring these blocks multiple times, since Perl keeps track of when to call them, so you never have to call them by name. (They are also unlike regular subroutines in that shift and pop act as though you were in the main program, and so they act on @ARGV by default, not @_.)

These four block types run in this order:

BEGIN: Runs ASAP (as soon as parsed) whenever encountered during compilation, before compiling the rest of the file.
CHECK: Runs when compilation is complete, but before the program starts. (CHECK can mean "checkpoint" or "double-check" or even just "stop".)
INIT: Runs at the beginning of execution right before the main flow of your program starts.
END: Runs at the end of execution right after the program finishes.

If you declare more than one of these by the same name, even in separate modules, the BEGINs all run before any CHECKs, which all run before any INITs, which all run before any ENDs--which all run dead last, after your main program has finished. Multiple BEGINs and INITs run in declaration order (FIFO), and the CHECKs and ENDs run in inverse declaration order (LIFO).

This is probably easiest to see in a demo:

#!/usr/bin/perl -l
print       "start main running here";
die         "main now dying here\n";
die         "XXX: not reached\n";
END         { print "1st END: done running"   }
CHECK       { print "1st CHECK: done compiling" }
INIT        { print "1st INIT: started running"  }
END         { print "2nd END: done running"   }
BEGIN       { print "1st BEGIN: still compiling" }
INIT        { print "2nd INIT: started running"  }
BEGIN       { print "2nd BEGIN: still compiling" }
CHECK       { print "2nd CHECK: done compiling" }
END         { print "3rd END: done running"   }

When run, that demo program produces this output:

1st BEGIN: still compiling
2nd BEGIN: still compiling
2nd CHECK: done compiling
1st CHECK: done compiling
1st INIT: started running
2nd INIT: started running
start main running here
main now dying here
3rd END: done running
2nd END: done running
1st END: done running

Because a BEGIN block executes immediately, it can pull in subroutine declarations, definitions, and importations before the rest of the file is even compiled. These can alter how the compiler parses the rest of the current file, particularly if you import subroutine definitions. At the very least, declaring a subroutine lets it be used as a list operator, making parentheses optional. If the imported subroutine is declared with a prototype, calls to it can be parsed like built-ins and can even override built-ins of the same name in order to give them different semantics. The use declaration is just a BEGIN block with an attitude.

END blocks, by contrast, are executed as late as possible: when your program exits the Perl interpreter, even if as a result of an untrapped die or other fatal exception. There are two situations in which an END block (or a DESTROY method) is skipped. It isn't run if, instead of exiting, the current process just morphs itself from one program to another via exec. A process blown out of the water by an uncaught signal also skips its END routines. (See the use sigtrap pragma described in Chapter 31, "Pragmatic Modules", for an easy way to convert catchable signals into exceptions. For general information on signal handling, see "Signals" in Chapter 16, "Interprocess Communication".) To avoid all END processing, you can call POSIX::_exit, say kill -9, $$, or just exec any innocuous program, such as /bin/true on Unix systems.

Inside an END block, $? contains the status the program is going to exit with. You can modify $? from within the END block to change the exit value of the program. Beware of changing $? accidentally by running another program with system or backticks.

If you have several END blocks within a file, they execute in reverse order of their definition. That is, the last END block defined is the first one executed when your program finishes. This reversal enables related BEGIN and END blocks to nest the way you'd expect, if you pair them up. For example, if the main program and a module it loads both have their own paired BEGIN and END subroutines, like so:

BEGIN { print "main begun" }
END { print "main ended" }
use Module;

and in that module, these declarations:

BEGIN { print "module begun" }
END { print "module ended" }

then the main program knows that its BEGIN will always happen first, and its END will always happen last. (Yes, BEGIN is really a compile-time block, but similar arguments apply to paired INIT and END blocks at run time.) This principle is recursively true for any file that includes another when both have declarations like these. This nesting property makes these blocks work well as package constructors and destructors. Each module can have its own set-up and tear-down functions that Perl will call automatically. This way the programmer doesn't have to remember that if a particular library is used, what special initialization or clean-up code ought to be invoked, and when. The module's declarations assure these events.

If you think of an evalSTRING as a call back from the interpreter to the compiler, then you might think of a BEGIN as a call forward from the compiler into the interpreter. Both temporarily put the current activity on hold and switch modes of operation. When we say that a BEGIN block is executed as early as possible, we mean it's executed just as soon as it is completely defined, even before the rest of the containing file is parsed. BEGIN blocks are therefore executed during compile time, never during run time. Once a BEGIN block has run, it is immediately undefined and any code it used is returned to Perl's memory pool. You couldn't call a BEGIN block as a subroutine even if you tried, because by the time it's there, it's already gone.

Similar to BEGIN blocks, INIT blocks are run just before the Perl run time begins execution, in "first in, first out" (FIFO) order. For example, the code generators documented in perlcc make use of INIT blocks to initialize and resolve pointers to XSUBs. INIT blocks are really just like BEGIN blocks, except they let the programmer distinguish construction that must happen at compile phase from construction that must happen at run phase. When you're running a script directly, that's not terribly important because the compiler gets invoked every time anyway; but when compilation is separate from execution, the distinction can be crucial. The compiler may only be invoked once, and the resulting executable may be invoked many times.

Similar to END blocks, CHECK blocks are run just after the Perl compile phase ends but before run phase begins, in LIFO order. CHECK blocks are useful for "winding down" the compiler just as END blocks are useful for winding down your program. In particular, the backends all use CHECK blocks as the hook from which to invoke their respective code generators. All they need to do is put a CHECK block into their own module, and it will run at the right time, so you don't have to install a CHECK into your program. For this reason, you'll rarely write a CHECK block yourself, unless you're writing such a module.

Putting it all together, Table 18-1 lists various constructs with details on when they compile and when they run the code represented by "...".

Table 18.1. What Happens When

Block	Compiles	Traps	Runs	Traps	Call
or	During	Compile	During	Run	Trigger
Expression	Phase	Errors	Phase	Errors	Policy
`use ...`	C	No	C	No	Now
`no ...`	C	No	C	No	Now
`BEGIN {...}`	C	No	C	No	Now
`CHECK {...}`	C	No	C	No	Late
`INIT {...}`	C	No	R	No	Early
`END {...}`	C	No	R	No	Late
`eval {...}`	C	No	R	Yes	Inline
`eval "..."`	R	Yes	R	Yes	Inline
`foo(...)`	C	No	R	No	Inline
`sub foo {...}`	C	No	R	No	Call anytime
`eval "sub {...}"`	R	Yes	R	No	Call later
`s/pat/.../e`	C	No	R	No	Inline
`s/pat/"..."/ee`	R	Yes	R	Yes	Inline

Now that you know the score, we hope you'll be able to compose and perform your Perl pieces with greater confidence.