Chapter 21. Internals and Externals

As we discussed in Chapter 18, "Compiling", perl (the program) contains both a compiler and an interpreter for programs written in Perl (the language). The Perl compiler/interpreter is itself written in C. In this chapter, we'll sketch how that C program works from the perspective of someone who wants either to extend or to embed Perl. When you extend Perl, you're putting a chunk of C code (called the extension) under the control of Perl, and when you embed Perl you're putting a Perl interpreter[1] under the control of a larger C program.

[1] While we are careful to distinguish the compiler from the interpreter when that distinction is important, it gets a bit wearisome to keep saying "compiler/interpreter", so we often just shorten that to "interpreter" to mean the whole glob of C code and data that functions like one instance of perl (the program); when you're embedding Perl, you can have multiple instances of the interpreter, but each behaves like its own little perl.

The brief coverage we provide here is no substitute for the online documentation of Perl's innards: see the documentation for perlguts, perlxs, perlxstut, perlcall, perlapi, and h2xs, all bundled with Perl. Again, unless you're extending or embedding Perl, you will never need to know any of this stuff.

Presuming you need to know, what you need to know first is a bit about Perl's guts. You'll also need to know C for most of what follows. You'll need a C compiler to run the examples. If your end goal is to create a module for other people to use, they'll need a C compiler too. Many of these examples will only run on Unix-like systems. Oh, and this material is subject to change in future releases of Perl.

In other words, here be dragons.

21.1. How Perl Works

When the Perl compiler is fed a Perl program, the first task it performs is lexical analysis: breaking down the program into its basic syntactic elements (often called tokens). If the program is:

print "Hello, world!\n";

the lexical analyzer breaks it down into three tokens: print, "Hello, world!\n", and the final semicolon. The token sequence is then parsed, fixing the relationship between the tokens. In Perl, the boundary between lexical analysis and parsing is blurred more than in other languages. (Other computer languages, that is. If you think about all the different meanings new Critter might have depending on whether there's a Critter package or a subroutine named new, you'll understand why. On the other hand, we disambiguate these kinds of things all the time in English.)

Once a program has been parsed and (presumably) understood, it is compiled into a tree of opcodes representing low-level operations, and finally that tree of operations is executed--unless you invoked Perl with the -c ("check syntax") switch, which exits upon completing the compilation phase. It is during compilation, not execution, that BEGIN blocks, CHECK blocks, and use statements are executed.

Chapter 21. Internals and Externals

Contents:

21.1. How Perl Works