Executing Your Code (Programming Perl)

18.3. Executing Your Code

To the first approximation, Sparc programs only run on Sparc machines, Intel programs only run on Intel machines, and Perl programs only run on Perl machines. A Perl machine possesses those attributes that a Perl program would find ideal in a computer: memory that is automatically allocated and deallocated, fundamental data types that are dynamic strings, arrays, and hashes, and have no size limits, and systems that all behave pretty much the same way. The job of the Perl interpreter is to make whatever computer it happens to be running on appear to be one of these idealistic Perl machines.

This fictitious machine presents the illusion of a computer specially designed to do nothing but run Perl programs. Each opcode produced by the compiler is a fundamental command in this emulated instruction set. Instead of a hardware program counter, the interpreter just keeps track of the current opcode to execute. Instead of a hardware stack pointer, the interpreter has its own virtual stack. This stack is very important because the Perl virtual machine (which we refuse to call a PVM) is a stack-based machine. Perl opcodes are internally called PP codes (short for "push-pop codes") because they manipulate the interpreter's virtual stack to find all operands, process temporary values, and store all results.

If you've ever programmed in Forth or PostScript, or used an HP scientific calculator with RPN ("Reverse Polish Notation") entry, you know how a stack machine works. Even if you haven't, the concept is simple: to add 3 and 4, you do things in the order 3 4 + instead of the more conventional 3 + 4. What this means in terms of the stack is that you push 3 and then 4 onto the stack, and + then pops both arguments off the stack, adds them, and pushes 7 back onto the stack, where it will sit until you do something else with it.

Compared with the Perl compiler, the Perl interpreter is a straightforward, almost boring, program. All it does is step through the compiled opcodes, one at a time, and dispatch them to the Perl run-time environment, that is, the Perl virtual machine. It's just a wad of C code, right?

Actually, it's not boring at all. A Perl virtual machine keeps track of a great deal of dynamic context on your behalf so that you don't have to. Perl maintains quite a few stacks, which you don't have to understand, but which we'll list here anyway just to impress you:

operand stack: That's the stack we already talked about.
save stack: Where localized values are saved pending restoration. Many internal routines also localize values without your knowing it.
scope stack: The lightweight dynamic context that controls when the save stack should be "popped".
context stack: The heavyweight dynamic context; who called whom to get where you are now. The caller function traverses this stack. Loop-control functions scan this stack to find out which loop to control. When you peel back the context stack, the scope stack gets peeled back appropriately, which restores all your local variables from the save stack, even if you left the earlier context by nefarious methods such as raising an exception and longjmp(3)ing out.
jumpenv stack: The stack of longjmp(3) contexts that allows us to raise exceptions or exit expeditiously.
return stack: Where we came from when we entered this subroutine.
mark stack: Where the current variadic argument list on the operand stack starts.
recursive lexical pad stacks: Where the lexical variables and other "scratch register" storage is kept when subroutines are called recursively.

And of course, there's the C stack on which all the C variables are stored. Perl actually tries to avoid relying on C's stack for the storage of saved values, since longjmp(3) bypasses the proper restoration of such values.

All this is to say that the usual view of an interpreter, a program that interprets another program, is really woefully inadequate to describe what's going on here. Yes, there's some C code implementing some opcodes, but when we say "interpreter", we mean something more than that, in the same way that when we say "musician", we mean something more than a set of DNA instructions for turning notes into sounds. Musicians are real, live organisms and have "state". So do interpreters.

Specifically, all this dynamic and lexical context, along with the global symbol tables, plus the parse trees, plus a thread of execution, is what we call an interpreter. As a context for execution, an interpreter really starts its existence even before the compiler starts, and can run in rudimentary form even as the compiler is building up the interpreter's context. In fact, that's precisely what's happening when the compiler calls into the interpreter to execute BEGIN blocks and such. And the interpreter can turn around and use the compiler to build itself up further. Every time you define another subroutine or load another module, the particular virtual Perl machine we call an interpreter is redefining itself. You can't really say that either the compiler or the interpreter is in control, because they're cooperating to control the bootstrap process we commonly call "running a Perl script". It's like bootstrapping a child's brain. Is it the DNA doing it or is it the neurons? A little of both, we think, with some input from external programmers.

It's possible to run multiple interpreters in the same process; they may or may not share parse trees, depending on whether they were started by cloning an existing interpreter or by building a new interpreter from scratch. It's also possible to run multiple threads in a single interpreter, in which case they share not only parse trees but also global symbols--see Chapter 17, "Threads".

But most Perl programs use only a single Perl interpreter to execute their compiled code. And while you can run multiple, independent Perl interpreters within one process, the current API for this is only accessible from C.[5] Each individual Perl interpreter serves the role of a completely separate process, but doesn't cost as much to create as a whole new process does. That's how Apache's mod_perl extension gets such great performance: when you launch a CGI script under mod_perl, that script has already been compiled into Perl opcodes, eliminating the need for recompilation--but more importantly, eliminating the need to start a new process, which is the real bottleneck. Apache initializes a new Perl interpreter in an existing process and hands that interpreter the previously compiled code to execute. Of course, there's much more to it than that--there always is. For more about mod_perl, see Writing Apache Modules with Perl and C (O'Reilly, 1999).

[5] With one exception, so far: revision 5.6.0 of Perl can do cloned interpreters in support of fork emulation on Microsoft Windows. There may well be a Perl API to "ithreads", as they're called, by the time you read this.

Many other applications such as nvi, vim, and innd can embed Perl interpreters; we can't hope to list them all here. There are a number of commercial products that don't even advertise that they have embedded Perl engines. They just use it internally because it gets their job done in style.