Handling Insecure Code (Programming Perl)

23.3. Handling Insecure Code

Taint checking is just the sort of security blanket you need if you want to catch bogus data you ought to have caught yourself, but didn't think to catch before passing off to the system. It's a bit like the optional warnings Perl can give you--they may not indicate a real problem, but on average the pain of dealing with the false positives is less than the pain of not dealing with the false negatives. With tainting, the latter pain is even more insistent, because using bogus data doesn't just give the wrong answers; it can blow your system right out of the water, along with your last two years of work. (And maybe your next two, if you didn't make good backups.) Taint mode is useful when you trust yourself to write honest code but don't necessarily trust whoever is feeding you data not to try to trick you into doing something regrettable.

Data is one thing. It's quite another matter when you don't even trust the code you're running. What if you fetch an applet off the Net and it contains a virus, or a time bomb, or a Trojan horse? Taint checking is useless here because the data you're feeding the program may be fine--it's the code that's untrustworthy. You're placing yourself in the position of someone who receives a mysterious device from a stranger, with a note that says, "Just hold this to your head and pull the trigger." Maybe you think it will dry your hair, but you might not think so for very long.

In this realm, prudence is synonymous with paranoia. What you want is a system that lets you impose a quarantine on suspicious code. The code can continue to exist, and even perform certain functions, but you don't let it wander around doing just anything it feels like. In Perl, you can impose a kind of quarantine using the Safe module.

23.3.1. Safe Compartments

The Safe module lets you set up a sandbox, a special compartment in which all system operations are trapped, and namespace access is carefully controlled. The low-level, technical details of this module are in a state of flux, so here we'll take a more philosophical approach.

23.3.1.1. Restricting namespace access

At the most basic level, a Safe object is like a safe, except the idea is to keep the bad people in, not out. In the Unix world, there is a syscall known as chroot(2) that can permanently consign a process to running only in a subdirectory of the directory structure--in its own private little hell, if you will. Once the process is put there, there is no way for it to reach files outside, because there's no way for it to name files outside.[13] A Safe object is a little like that, except that instead of being restricted to a subset of the filesystem's directory structure, it's restricted to a subset of Perl's package structure, which is hierarchical just as the filesystem is.

[13]Some sites do this for executing all CGI scripts, using loopback, read-only mounts. It's something of a pain to set up, but if someone ever escapes, they'll find there's nowhere to go.

Another way to look at it is that the Safe object is like one of those observation rooms with one-way mirrors that the police put suspicious characters into. People on the outside can look into the room, but those inside can't see out.

When you create a Safe object, you may give it a package name if you want. If you don't, a new one will be chosen for you:

use Safe;
my $sandbox = Safe->new("Dungeon");
$Dungeon::foo = 1;  # Direct access is discouraged, though.

If you fully qualify variables and functions using the package name supplied to the new method, you can access them in that package from the outside, at least in the current implementation. This may change however, since the current plan is to clone the symbol table into a new interpreter. Slightly more upward compatible might be to set things up first before creating the Safe, as shown below. This is likely to continue working and is a handy way to set up a Safe that has to start off with a lot of "state". (Admittedly, $Dungeon::foo isn't a lot of state.)

use Safe;
$Dungeon::foo = 1;  # Still direct access, still discouraged.
my $sandbox = Safe->new("Dungeon");

But Safe also provides a way to access the compartment's globals even if you don't know the name of the compartment's package. So for maximal upward compatibility (though less than maximal speed), we suggest you use the reval method:

use Safe;
my $sandbox = Safe->new();
$sandbox->reval('$foo = 1');

(In fact, that's the same method you'll use to run suspicious code.) When you pass code into the compartment to compile and run, that code thinks that it's really living in the main package. What the outside world calls $Dungeon::foo, the code inside thinks of as $main::foo, or $::foo, or just $foo if you aren't running under use strict. It won't work to say $Dungeon::foo inside the compartment, because that would really access $Dungeon::Dungeon::foo. By giving the Safe object its own notion of main, variables and subroutines in the rest of your program are protected.

To compile and run code inside the compartment, use the reval ("restricted eval") method, passing the code string as its argument. Just as with any other evalSTRING construct, compilation errors and run-time exceptions in reval don't kill your program. They just abort the reval and leave the exception in $@, so make sure to check it after every reval call.

Using the initializations given earlier, this code will print out that "foo is now 2":

$sandbox->reval('$foo++; print "foo is now $main::foo\n"');
if ($@) {
    die "Couldn't compile code in box: $@";
}

If you just want to compile code and not run it, wrap your string in a subroutine declaration:

$sandbox->reval(q{
    our $foo;
    sub say_foo {
        print "foo is now $main::foo\n";
    }
}, 1);
die if $@;      # check compilation

This time we passed reval a second argument which, since it's true, tells reval to compile the code under the strict pragma. From within the code string, you can't disable strictness, either, because importing and unimporting are just two of the things you can't normally do in a Safe compartment. There are a lot of things you can't do normally in a Safe compartment--see the next section.

Once you've created the say_foo function in the compartment, these are pretty much the same:

$sandbox->reval('say_foo()');       # Best way.
die if $@;

$sandbox->varglob('say_foo')->();   # Call through anonymous glob.

Dungeon::say_foo();                 # Direct call, strongly discouraged.

23.3.1.2. Restricting operator access

The other important thing about a Safe object is that Perl limits the available operations within the sandbox. (You might well let your kid take a bucket and shovel into the sandbox, but you'd probably draw the line at a bazooka.) It's not enough to protect just the rest of your program; you need to protect the rest of your computer, too.

When you compile Perl code in a Safe object, either with reval or rdo (the restricted version of the doFILE operator), the compiler consults a special, per-compartment access-control list to decide whether each individual operation is deemed safe to compile. This way you don't have to stress out (much) worrying about unforeseen shell escapes, opening files when you didn't mean to, strange code assertions in regular expressions, or most of the external access problems folks normally fret about. (Or ought to.)

The interface for specifying which operators should be permitted or restricted is currently under redesign, so we only show how to use the default set of them here. For details, consult the online documentation for the Safe module.

The Safe module doesn't offer complete protection against denial-of-service attacks, especially when used in its more permissive modes. Denial-of-service attacks consume all available system resources of some type, denying other processes access to essential system facilities. Examples of such attacks include filling up the kernel process table, dominating the CPU by running forever in a tight loop, exhausting available memory, and filling up a filesystem. These problems are very difficult to solve, especially portably. See the end of the section Section 23.3.2, "Code Masquerading as Data" for more discussion of denial-of-service attacks.

23.3.1.3. Safe examples

Imagine you've got a CGI program that manages a form into which the user may enter an arbitrary Perl expression and get back the evaluated result.[14] Like all external input, the string comes in tainted, so Perl won't let you eval it yet--you'll first have to untaint it with a pattern match. The problem is that you'll never be able to devise a pattern that can detect all possible threats. And you don't dare just untaint whatever you get and send it through the built-in eval. (If you do that, we will be tempted to break into your system and delete the script.)

[14]Please don't laugh. We really have seen web pages that do this. Without a Safe!

That's where reval comes in. Here's a CGI script that processes a form with a single form field, evaluates (in scalar context) whatever string it finds there, and prints out the formatted result:

#!/usr/bin/perl -lTw
use strict;
use CGI::Carp 'fatalsToBrowser';
use CGI qw/:standard escapeHTML/;
use Safe;

print header(-type => "text/html;charset=UTF-8"),
      start_html("Perl Expression Results");
my $expr = param("EXPR") =~ /^([^;]+)/
            ? $1 # return the now-taintless portion
            : croak("no valid EXPR field in form");
my $answer = Safe->new->reval($expr);
die if $@;

print p("Result of", tt(escapeHTML($expr)),
               "is", tt(escapeHTML($answer)));

Imagine some evil user feeding you "print `cat /etc/passwd`" (or worse) as the input string. Thanks to the restricted environment that disallows backticks, Perl catches the problem during compilation and returns immediately. The string in $@ is "quoted execution (``, qx) trapped by operation mask", plus the customary trailing information identifying where the problem happened.

Because we didn't say otherwise, the compartments we've been creating all used the default set of allowable operations. How you go about declaring specific operations permitted or forbidden isn't important here. What is important is that this is completely under the control of your program. And since you can create multiple Safe objects in your program, you can confer various degrees of trust upon various chunks of code, depending on where you got them from.

If you'd like to play around with Safe, here's a little interactive Perl calculator. It's a calculator in that you can feed it numeric expressions and immediately see their results. But it's not limited to numbers alone. It's more like the looping example under eval in Chapter 29, "Functions", where you can take whatever they give you, evaluate it, and give them back the result. The difference is that the Safe version doesn't execute just anything you feel like. You can run this calculator interactively at your terminal, typing in little bits of Perl code and checking the answers, to get a feel for what sorts of protection Safe provides.

#!/usr/bin/perl -w
# safecalc - demo program for playing with Safe
use strict;
use Safe;
my $sandbox = Safe->new();
while (1) {
    print "Input: ";
    my $expr = <STDIN>;
    exit unless defined $expr;
    chomp($expr);
    print "$expr produces ";
    local $SIG{__WARN__} = sub { die @_ };
    my $result = $sandbox->reval($expr, 1);
    if ($@ =~ s/at \(eval \d+\).*//) {
        printf "[%s]: %s", $@ =~ /trapped by operation mask/
            ? "Security Violation" : "Exception", $@;
    }
    else {
        print "[Normal Result] $result\n";
    }
}

Warning: the Safe module is currently being redesigned to run each compartment within a completely independent Perl interpreter inside the same process. (This is the strategy that Apache's mod_perl employs when running precompiled Perl scripts.) Details are still hazy at this time, but our crystal ball suggests that blindly poking at things inside the compartment using a named package won't get you very far after the impending rewrite. If you're running a version of Perl later than 5.6, check the release notes in perldelta(1) to see what's changed, or consult the documentation for the Safe module itself. (Of course, you always do that anyway, right?)

23.3.2. Code Masquerading as Data

Safe compartments are available for when the really scary stuff is going down, but that doesn't mean you should let down your guard totally when you're doing the everyday stuff around home. You need to cultivate an awareness of your surroundings and look at things from the point of view of someone wanting to break in. You need to take proactive steps like keeping things well lit and trimming the bushes that can hide various lurking problems.

Perl tries to help you in this area, too. Perl's conventional parsing and execution scheme avoids the pitfalls that shell programming languages often fall prey to. There are many extremely powerful features in the language, but by design, they're syntactically and semantically bounded in ways that keep things under the control of the programmer. With few exceptions, Perl evaluates each token only once. Something that looks like it's being used as a simple data variable won't suddenly go rooting around in your filesystem.

Unfortunately, that sort of thing can happen if you call out to the shell to run other programs for you, because then you're running under the shell's rules instead of Perl's. The shell is easy to avoid, though--just use the list argument forms of the system, exec, or piped open functions. Although backticks don't have a list-argument form that is proof against the shell, you can always emulate them as described in the section Section 23.1.3, "Accessing Commands and Files Under Reduced Privileges". (While there's no syntactic way to make backticks take an argument list, a multi-argument form of the underlying readpipe operator is in development; but as of this writing, it isn't quite ready for prime time.)

When you use a variable in an expression (including when you interpolate it into a double-quoted string), there's No Chance that the variable will contain Perl code that does something you aren't intending.[15] Unlike the shell, Perl never needs defensive quotes around variables, no matter what might be in them.

$new = $old;                # No quoting needed.
print "$new items\n";       # $new can't hurt you.

$phrase = "$new items\n";   # Nor here, neither.
print $phrase;              # Still perfectly ok.

Perl takes a "what you see is what you get" approach. If you don't see an extra level of interpolation, then it doesn't happen. It is possible to interpolate arbitrary Perl expressions into strings, but only if you specifically ask Perl to do that. (Even so, the contents are still subject to taint checking if you're in taint mode.)

$phrase = "You lost @{[ 1 + int rand(6) ]} hit points\n";

Interpolation is not recursive, however. You can't just hide an arbitrary expression in a string:

$count = '1 + int rand(6)';             # Some random code.
$saying = "$count hit points";          # Merely a literal.
$saying = "@{[$count]} hit points";     # Also a literal.

Both assignments to $saying would produce "1 + int rand(6) hit points", without evaluating the interpolated contents of $count as code. To get Perl to do that, you have to call evalSTRING explicitly:

$code = '1 + int rand(6)';
$die_roll = eval $code;
die if $@;

If $code were tainted, that evalSTRING would raise its own exception. Of course, you almost never want to evaluate random user code--but if you did, you should look into using the Safe module. You may have heard of it.

[15]Although if you're generating a web page, it's possible to emit HTML tags, including JavaScript code, that might do something that the remote browser isn't expecting.

There is one place where Perl can sometimes treat data as code; namely, when the pattern in a qr//, m//, or s/// operator contains either of the new regular expression assertions, (?{CODE}) or (??{CODE}). These pose no security issues when used as literals in pattern matches:

$cnt = $n = 0;
while ($data =~ /( \d+ (?{ $n++ }) | \w+ )/gx) {
    $cnt++;
}
print "Got $cnt words, $n of which were digits.\n";

But existing code that interpolates variables into matches was written with the assumption that the data is data, not code. The new constructs might have introduced a security hole into previously secure programs. Therefore, Perl refuses to evaluate a pattern if an interpolated string contains a code assertion, and raises an exception instead. If you really need that functionality, you can always enable it with the lexically scoped use re 'eval' pragma. (You still can't use tainted data for an interpolated code assertion, though.)

A completely different sort of security concern that can come up with regular expressions is denial-of-service problems. These can make your program quit too early, or run too long, or exhaust all available memory--and sometimes even dump core, depending on the phase of the moon.

When you process user-supplied patterns, you don't have to worry about interpreting random Perl code. However, the regular expression engine has its own little compiler and interpreter, and the user-supplied pattern is capable of giving the regular expression compiler heartburn. If an interpolated pattern is not a valid pattern, a run-time exception is raised, which is fatal unless trapped. If you do try to trap it, make sure to use only evalBLOCK, not evalSTRING, because the extra evaluation level of the latter would in fact allow the execution of random Perl code. Instead, do something like this:

if (not eval { "" =~ /$match/; 1 }) {
    # (Now do whatever you want for a bad pattern.)
}
else {
    # We know pattern is at least safe to compile.
    if ($data =~ /$match/) { ... }
}

A more troubling denial-of-service problem is that given the right data and the right search pattern, your program can appear to hang forever. That's because some pattern matches require exponential time to compute, and this can easily exceed the MTBF rating on our solar system. If you're especially lucky, these computationally intensive patterns will also require exponential storage. If so, your program will exhaust all available virtual memory, bog down the rest of the system, annoy your users, and either die with an orderly "Out of memory!" error or else leave behind a really big core dump file, though perhaps not as large as the solar system.

Like most denial-of-service attacks, this one is not easy to solve. If your platform supports the alarm function, you could time out the pattern match. Unfortunately, Perl cannot (currently) guarantee that the mere act of handling a signal won't ever trigger a core dump. (This is scheduled to be fixed in a future release.) You can always try it, though, and even if it the signal isn't handled gracefully, at least the program won't run forever.

If your system supports per-process resource limits, you could set these in your shell before calling the Perl program, or use the BSD::Resource module from CPAN to do so directly from Perl. The Apache web server allows you to set time, memory, and file size limits on CGI scripts that it launches.

Finally, we hope we've left you with some unresolved feelings of insecurity. Remember, just because you're paranoid doesn't mean they're not out to get you. So you might as well enjoy it.