Examples (CGI Programming with Perl)

3.4. Examples

At this point, we have covered the fundamentals of how CGI scripts work, but the concepts may still seem a little abstract. The following sections present examples that demonstrate how to implement what we've discussed.

3.4.1. Check the Client Browser

CGI scripts are not restricted to generating HTML. Example 3-4 produces an image after choosing an image format that the browser supports. Recall that the browser sends the Accept HTTP header listing the media types it supports. Actually, browsers generally specify only the newer media types they support and pass a wildcard to match everything else. In this example, we'll send an image in the new PNG format if the browser specifies that it supports PNG, and a JPEG otherwise.

You may ask why we would want to do this. Well, JPEG images use a lossy form of compression. Although they are ideal for natural images like photographs, images with sharp lines and details (such as screenshots or text) can become blurred. PNG images, like GIF images, do not use lossy compression. They are typically larger than JPEG images (it depends on the image), but they provide sharp detail. And unlike GIFs, which are limited to 256 colors, PNGs can support millions of colors and even eight-bit transparency. So we will provide a high-color, high-detail PNG if possible, or a JPEG otherwise.

If a user calls this with http://localhost/cgi/image_fetch.cgi/new_screenshot.png, he or she will actually get new_screenshot.png or new_screenshot.jpeg depending on what the browser supports. This allows you to include a single link in your HTML pages that works for everyone. Example 3-4 shows the source to our CGI script.

Example 3-4. image_fetch.cgi

#!/usr/bin/perl -wT

use strict;

my $image_type = $ENV{HTTP_ACCEPT} =~ m|image/png| ? "png" : "jpeg";
my( $basename ) = $ENV{PATH_INFO} =~ /^(\w+)/;
my $image_path = "$ENV{DOCUMENT_ROOT}/images/$basename.$image_type";

unless ( $basename and -B $image_path and open IMAGE, $image_path ) {
    print "Location: /errors/not_found.html\n\n";
    exit;
}

my $buffer;
print "Content-type: image/$image_type\n\n";
binmode;

while ( read( IMAGE, $buffer, 16_384 ) ) {
    print $buffer;
}

We set $image_type to "png" or "jpeg" depending on whether the browser sent image/png as part of its Accept header. Then we set $basename to the first word of the additional path information, which is "new_screenshot" in our previous example. We only care about the base name because we add our own extension when we actually fetch the file.

Our images are in the images directory at the root of the web server's document tree, so we build a path to the image and assign it to $image_path. Note that we build this path before we validate that the URL we received actually contains additional path information. If $ENV{PATH_INFO} is empty or starts with a nonalphanumeric character, then obviously this path is invalid. That's okay though; we will validate this in the next step.

We delayed the validation so we can perform all of our tests at once. We test that the additional path information contains a name, that the full path to the file we constructed points to a binary file, and that we are able to open the file. If any of these tests fail, then we simply report that the file is not found. We do this by forwarding to a static page that contains our error message. Creating a single, static document for general errors like 404 Not Found is an easy way to produce error pages that are customized to match your site design and are easy to maintain.

If we opened the file successfully, we read and print the file in 16KB increments. Calling binmode is necessary for systems like Win32 or MacOS that do not use newlines as the end-of-line character; it doesn't hurt on Unix systems.

3.4.2. User Authentication and Identification

In addition to domain-based security, most HTTP servers also support another method of security, known as user authentication. We discussed user authentication briefly in the last chapter. When configured for user authentication, specified files or directories within a given realm are set up to allow access only by certain users. A user attempting to open the URLs associated with these files is prompted for a name and password.

The username and password is checked by the server, and if legitimate, the user is allowed access. In addition to allowing the user access to the protected file, the server also maintains the user's name and passes it to any subsequent CGI programs that are called. The server passes the username in the $ENV{REMOTE_USER} environment variable.

A CGI script can therefore use server authentication information to identify users. Here is a snippet of code that illustrates what you can do with the $ENV{REMOTE_USER} environment variable:

$remote_user = $ENV{REMOTE_USER};

if ( $remote_user eq "mary" ) {
    print "Welcome Mary, how is your company doing these days?\n";
} elsif ( $remote_user eq "bob" ) {
    print "Hey Bob, how are you doing? I heard you were sick.\n";
}

3.4.3. Restricting Image Hijacking

One of the great benefits of the Web is its flexibility. One person can create a page on their server and include links to others' pages on other servers. These links can even include links to images on other servers. Unfortunately, if you have popular images, you may not appreciate this last feature. Say, for example, you are an artist and you display your images on your web site. You may not want other sites to include your artwork in their web pages simply by including image links pointing to your server. One solution, shown in Example 3-5, is to check the URL that referred the user to the image via the Referer HTTP header field.[5]

[5]The Referer header is not as reliable as you might hope. Not all browsers provide it, and as we will see in Chapter 8, "Security", it's possible for clients to provide a false Referer header. However, in this scenario, the culprits are other servers, not the users themselves, and it is not possible for other servers to cause clients to provide false headers.

Example 3-5. check_referer.cgi

#!/usr/bin/perl -wT

use strict;

# The directory where images are stored; this shouldn't be in the 
# server's doc tree so users can't browse images except via this script.
my $image_dir = "/usr/local/apache/data/images";

my $referer  = $ENV{HTTP_REFERER};
my $hostname = quotemeta( $ENV{HTTP_HOST} || $ENV{SERVER_NAME} );

if ( $referer and $referer !~ m|^http://$hostname/| ) {
    display_image( "copyright.gif" );
}
else {
    # Verify that the image name doesn't contain any unsafe characters.
    my( $image_file ) = $ENV{PATH_INFO} =~ /^([\w+.]+)$/ or
        not_found(  );
    display_image( $image_file );
}
sub display_image {
    my $file = shift;
    my $full_path = "$image_dir/$file";
    
    # We'll simply report that the file isn't found if we can't open it.
    open IMAGE, $full_path or not_found(  );
    
    print "Pragma: no-cache\n";
    print "Content-type: image/gif\n\n";
    
    binmode;
    my $buffer = "";
    while ( read( IMAGE, $buffer, 16_384 ) ) {
        print $buffer;
    }
    close IMAGE;
}

sub not_found {
    print <<END_OF_ERROR;
Status: 404 Not Found
Content-type: text/html

<html>
<head>
  <title>File Not Found</title>
</head>

<body>
  <h1>File Not Found</h1>
  
  <p>Sorry, but you requested an image that could not be found. 
    Please check the URL you entered and try again.</p>
</body>
</html>
END_OF_ERROR
    
    exit;
}

This script displays an image with a copyright notice if the user came from a different web site. For the copyright notice, the script assumes that there is a file called copyright.gif in the same directory as the other images. Not all browsers implement the Referer HTTP header, and we don't want visitors using these browsers to get the wrong image in error. So we only display the copyright image if the user both presents a Referer header and it is from a different server. Also, we have to be conscious of caching on the Web. Browsers might cache images, and they may be behind any number of proxy servers that also implement their own caches. Thus, we output an additional header to request that this message not be cached. This should avoid the user getting a cached copyright notice image when they visit the real site. If you are especially paranoid (and do not mind the extra traffic it causes), then you could also output a Pragma: no-cache header for the real images too.

If the image is not found, it sends a response with a 404 status. You may wonder why it would send an HTML message when it was likely the request was the result of an image tag and the browser is planning on displaying the response as an image in an HTML page. Actually, neither web servers nor CGI scripts have any way of determining the context of any request. Web servers always display 404 errors when they cannot locate a resource. In this case the browser will likely display an icon, such as a broken image, to indicate that there was an error. If the user chooses to view the image separately by directly referencing it, he or she will see the error message.

This solution should stop casual hijackers. It won't stop thieves. It's always possible for someone to visit your site, download your images, and put copies of them up on their own site.