Code Generators (Programming Perl)

18.5. Code Generators

The three current backends that convert Perl opcodes into some other format are all emphatically experimental. (Yes, we said this before, but we don't want you to forget.) Even when they happen to produce output that runs correctly, the resulting programs may take more disk space, more memory, and more CPU time than than they would ordinarily. This is an area of ongoing research and development. Things will get better.

18.5.1. The Bytecode Generator

The B::Bytecode module writes the parse tree's opcodes out in a platform-independent encoding. You can take a Perl script compiled down to bytecodes and copy that to any other machine with Perl installed on it.

The standard but currently experimental perlcc(1) command knows how to convert Perl source code into a byte-compiled Perl program. All you have to do is:

% perlcc -b -o pbyscript srcscript

And now you should be able to directly "execute" the resulting pbyscript. The start of that file looks somewhat like this:

#!/usr/bin/perl
use ByteLoader 0.03;
^C^@^E^A^C^@^@^@^A^F^@^C^@^@^@^B^F^@^C^@^@^@^C^F^@^C^@^@^@
B^@^@^@^H9^A8M-^?M-^?M-^?M-^?7M-^?M-^?M-^?M-^?6^@^@^@^A6^@
^G^D^D^@^@^@^KR^@^@^@^HS^@^@^@^HV^@M-2W<^FU^@^@^@^@X^Y@Z^@
...

There you find a small script header followed by purely binary data. This may seem like deep magic, but its dweomer, er, dwimmer is at most a minor one. The ByteLoader module uses a technique called a source filter to alter the source code before Perl gets a chance to see it. A source filter is a kind of preprocessor that applies to everything below it in the current file. Instead of being limited to simplistic transformations the way macro processors like cpp(1) and m4(1) are, here there are no constraints. Source filters have been used to augment Perl's syntax, to compress or encrypt source code, even to write Perl programs in Latin. E perlibus unicode; cogito, ergo substr; carp dbm, et al. Er, caveat scriptor.

The ByteLoader module is a source filter that knows how to disassemble the serialized opcodes produced by B::Bytecode to reconstruct the original parse tree. The reconstituted Perl code is spliced into the current parse tree without using the compiler. When the interpreter hits those opcodes, it just executes them as though they'd been there waiting for it all along.

18.5.2. The C Code Generators

The remaining code generators, B::C and B::CC, both produce C code instead of serialized Perl opcodes. The code they generate is far from readable, and if you try to read it you'll just go blind. It's not something you can use to plug little translated Perl-to-C bits into a larger C program. For that, see Chapter 21, "Internals and Externals".

The B::C module just writes out the C data structures needed to recreate the entire Perl run-time environment. You get a dedicated interpreter with all the compiler-built data structures pre-initialized. In some senses, the code generated is like what B::Bytecode produces. Both are a straight translation of the opcode trees that the compiler built, but where B::Bytecode lays them out in symbolic form to be recreated later and plugged into a running Perl interpreter, B::C lays those opcodes down in C. When you compile this C code with your C compiler and link in the Perl library, the resulting program won't need a Perl interpreter installed on the target system. (It might need some shared libraries, though, if you didn't link everything statically.) However, this program isn't really any different than the regular Perl interpreter that runs your script. It's just precompiled into a standalone executable image.

The B::CC module, however, tries to do more than that. The beginning of the C source file it generates looks pretty much like what B::C produced,[6] but eventually, any similarity ends. In the B::C code, you have a big opcode table in C that's manipulated just as the interpreter would do on its own, whereas in the C code generated by B::CC is laid out in the order corresponding to the run-time flow of your program. It even has a C function corresponding to each function in your program. Some amount of optimization based on variable types is done; a few benchmarks can run twice as fast as in the standard interpreter. This is the most ambitious of the current code generators, the one that holds the greatest promise for the future. By no coincidence, it is also the least stable of the three.

[6]But then, so does everything once you've gone blind. Didn't we warn you not to peek?

Computer science students looking for graduate thesis projects need look no further. There are plenty of diamonds in the rough waiting to be polished off here.