[Chapter 6] 6.4 Cooperating with Other Languages

6.4 Cooperating with Other Languages

Just as there are many levels on which languages can compete, so too there are many levels on which languages can cooperate. Here we'll talk primarily about generation, translation and embedding (via linking).

Program Generation

Almost from the time people first figured out that they could write programs, they started writing programs that write other programs. These are called program generators. (If you're a history buff, you might know that RPG stood for Report Program Generator long before it stood for Role Playing Game.) Now, anyone who has written a program generator knows that it can make your eyes go crossed even when you're wide awake. The problem is simply that much of your program's data looks like real code, but isn't (at least not yet). The same text file contains both stuff that does something and similar looking stuff that doesn't. Perl has various features that make it easier to mix it together with other languages, textually speaking.

Of course, these features also make it easier to write Perl in Perl, but it's rather expected that Perl would cooperate with itself.

Generating other languages in Perl

Perl is, of course, a text-processing language, and most computer languages are textual. Beyond that, the lack of arbitrary limits together with the various quoting and interpolation mechanisms make it pretty easy to visually isolate the code of the other language you're spitting out. For example, here is a small chunk of s2p, the sed-to-perl translator:

print &q(<<"EOT");
:       #!$bin/perl
:       eval 'exec $bin/perl -S \$0 \${1+"\$@"}'
:               if \$running_under_some_shell;
:       
EOT

Here the enclosed text happens to be legal in two languages, both Perl and shell. We've used the trick of putting a colon and a tab on the front of every line, which visually isolates the enclosed code. One variable, $bin, is interpolated in the multi-line quote in two places, and then the string is passed through a function to strip the colon and tab.

Of course, you aren't required to use multi-line quotes. One often sees CGI scripts containing millions of print statements, one per line. It seems a bit like driving to church in an F-16, but hey, if it gets you there. . . .

When you are embedding a large, multi-line quote containing some other language (such as HTML), it's sometimes helpful to pretend you're enclosing Perl into the other language instead:

print <<"END";
stuff
blah blah blah ${ \( EXPR ) } blah blah blah
blah blah blah @{[ LIST ]} blah blah blah
nonsense
END

You can use either of those two tricks to interpolate the value of any scalar EXPR or LIST into a longer string.

Generating Perl in other languages

Perl can easily be generated in other languages because it's both concise and malleable. You can pick your quotes not to interfere with the other language's quoting mechanisms. You don't have to worry about indentation, or where you put your line breaks, or whether to backslash your backslashes yet again. You aren't forced to define a package as a single string in advance, since you can slide into your package's namespace repeatedly, whenever you want to evaluate more code in that package.

Translation from Other Languages

One of the very first Perl applications was the sed-to-perl translator, s2p. In fact, Larry delayed the initial release of Perl in order to complete s2p and awk-to-perl (a2p), because he thought they'd improve the acceptance of Perl. Hmm, maybe they did.

s2p

The s2p program takes a sed script specified on the command line (or from standard input) and produces a comparable Perl script on the standard output.

Options include:

-Dnumber

Sets debugging flags.

-n

Specifies that this sed script was always invoked as sed -n. Otherwise a switch parser is prepended to the front of the script.

-p

Specifies that this sed script was never invoked as sed -n. Otherwise a switch parser is prepended to the front of the script.

The Perl script produced looks very sed-like, and there may very well be better ways to express what you want to do in Perl. For instance, s2p does not make any use of the split operator, but you might want to.

The Perl script you end up with may be either faster or slower than the original sed script. If you're only interested in speed you'll just have to try it both ways. Of course, if you want to do something sed doesn't do, you have no choice. It's often possible to speed up the Perl script by various methods, such as deleting all references to $\ and chop.

a2p

The a2p program takes an awk script specified on the command line (or from standard input) and produces a comparable Perl script on the standard output.

Options include:

-Dnumber

Sets debugging flags.

-Fcharacter

Tells a2p that this awk script is always invoked with a -F switch specifying character.

-nfieldlist

Specifies the names of the input fields if input does not have to be split into an array for some programmatic reason. If you were translating an awk script that processes the password file, you might say:

a2p -7 -nlogin.password.uid.gid.gcos.shell.home

Any delimiter may be used to separate the field names.

-number

Causes a2p to assume that input will always have that many fields.

a2p cannot do as good a job translating as a human would, but it usually does pretty well. There are some areas where you may want to examine the Perl script produced and tweak it some. Here are some of them, in no particular order.

There is an awk idiom of putting int(...) around a string expression to force numeric interpretation, even though the argument is always an integer anyway. This is generally unneeded in Perl, but a2p can't tell if the argument is always going to be an integer, so it leaves it in. You may wish to remove it.

Perl differentiates numeric comparison from string comparison. awk has one operator for both that decides at run-time which comparison to do. a2p does not try to do a complete job of awk emulation at this point. Instead it guesses which one you want. It's almost always right, but it can be spoofed. All such guesses are marked with the comment #???. You should go through and check them. You might want to run at least once with Perl's -w switch, which warns you if you use == where you should have used eq.

It would be possible to emulate awk 's behavior in selecting string versus numeric operations at run-time by inspection of the operands, but it would be gross and inefficient. Besides, a2p almost always guesses right.

Perl does not attempt to emulate the behavior of awk in which nonexistent array elements spring into existence simply by being referenced. If somehow you are relying on this mechanism to create null entries for a subsequent for . . . in, they won't be there in Perl.

If a2p makes a split command that assigns to a list of variables that looks like ($Fld1, $Fld2, $Fld3...) you may want to rerun a2p using the -n option mentioned above. This will let you name the fields throughout the script. If it splits to an array instead, the script is probably referring to the number of fields somewhere.

The "exit" statement in awk doesn't necessarily exit; it goes to the END block if there is one. awk scripts that do contortions within the END block to bypass the block under such circumstances can be simplified by removing the conditional in the END block and just exiting directly from the Perl script.

Perl has two kinds of arrays, numerically indexed and associative. awk arrays are usually translated to associative arrays, but if you happen to know that the index is always going to be numeric, you could change the { . . . } to [ . . . ]. Remember that iteration over an associative array is done using the keys function, but iteration over a numeric array isn't. You might need to modify any loop that is iterating over the array in question.

awk starts by assuming OFMT has the value %.6g. Perl starts by assuming its equivalent, $#, to have the value %.20g. You'll want to set $# explicitly if you use the default value of OFMT. (Actually, you probably don't want to set $#, but rather put in printf formats everywhere it matters.)

Near the top of the line loop will be the split operator that is implicit in the awk script. There are times when you can move this operator down past some conditionals that test the entire record, so that the split is not done as often.

For aesthetic reasons you may wish to change the array base $[ from 1 back to Perl's default of 0, but remember to change all array subscripts and all substr and index operations to match.

Cute comments that say:

# Here's a workaround because awk is so dumb.

are, of course, passed through unmodified.

awk scripts are often embedded in a shell script that pipes stuff into and out of awk. Often the shell script wrapper can be incorporated into the Perl script, since Perl can start up pipes into and out of itself, and can do other things that awk can't do by itself.

Scripts that refer to the special variables RSTART and RLENGTH can often be simplified by referring to the variables $`, $&, and $', as long as they are within the scope of the pattern match that sets them.

The produced Perl script may have subroutines defined to deal with awk 's semantics regarding "getline" and "print". Since a2p usually picks correctness over efficiency, it is almost always possible to rewrite such code to be more efficient by discarding the semantic sugar.

ARGV[0] translates to $0, but ARGV[n] translates to $ARGV[$n]. A loop that tries to iterate over ARGV[0] won't find it.

NOTE:

Storage for the awk syntax tree is currently static, and can run out. You'll need to recompile a2p if that happens.

find2perl

The find2perl program is really easy to understand if you already understand the UNIX find (1) program. Just type find2perl instead of find, and give it the same arguments you would give to find. It will spit out an equivalent Perl script.

There are a couple of options you can use that your ordinary find (1) command probably doesn't support:

-tar tarfile

Outputs a tar file much like the -cpio switch of some versions of find.

-eval string

Evaluates the string as a Perl expression, and continues if true.

Source filters

The notion of a source filter started with the idea that a script or module should be able to decrypt itself on the fly, like this:

#!/usr/bin/perl
use MyDecryptFilter;
@*x$]`0uN&k^Zx02jZ^X{.?s!(f;9Q/^A^@~~8H]|,%@^P:q-=
...

But the idea grew from there, and now a source filter can be defined to do any transformation on the input text you like. One can now even do things like this:

#!/usr/bin/perl
use Filter::exec "a2p";
1,30{print $1}

Put that together with the notion of the -x switch mentioned at the beginning of this chapter, and you have a general mechanism for pulling any chunk of program out of an article and executing it, regardless of whether it's written in Perl or not. Now that's cooperation.

The Filter module is available from CPAN.