Outputting Image Data (CGI Programming with Perl)

13.2.1. An Example

Example 13-1 shows a CGI script that returns a random image each time it is called.

Example 13-1. random_image.cgi

#!/usr/bin/perl -wT

use strict;
use CGI;
use CGI::Carp;

use constant BUFFER_SIZE     => 4_096;
use constant IMAGE_DIRECTORY => "/usr/local/apache/data/random-images";

my $q = new CGI;
my $buffer = "";

my $image = random_file( IMAGE_DIRECTORY, '\\.(png|jpg|gif)$' );
my( $type ) = $image =~ /\.(\w+)$/;
$type eq "jpg" and $type = "jpeg";

print $q->header( -type => "image/$type", -expires => "-1d" );
binmode STDOUT;

local *IMAGE;
open IMAGE, IMAGE_DIRECTORY . "/$image" or die "Cannot open file $image: $!";
while ( read( IMAGE, $buffer, BUFFER_SIZE ) ) {
    print $buffer;
}
close IMAGE;


# Takes a path to a directory and an optional filename mask regex
# Returns the name of a random file from the directory
sub random_file {
    my( $dir, $mask ) = @_;
    my $i = 0;
    my $file;
    local *DIR, $_;
    
    opendir DIR, $dir or die "Cannot open $dir: $!";
    while ( defined ( $_ = readdir DIR ) ) {
        /$mask/o or next if defined $mask;
        rand ++$i < 1 and $file = $_;
    }
    closedir DIR;
    return $file;
}

This CGI script starts like our other CGI scripts, but the random_file function requires a little explanation. We pass the random_file function the path to our image directory and a regular expression that matches GIF, PNG, and JPEG files extensions. The algorithm that random_file uses is adopted from an algorithm for selecting a random line from a text file that appears in the perlfaq5 manpage (it originally appeared in Programming Perl ):

rand($.) > 1 && ( $line = $_ ) while <>;

This code selects a line from a text file, by reading the file only once, and needing to store only two lines in memory at a time. It always sets $line to the first line, then there is a one in two chance that it will set it to the second line, a one in three chance for the third line, etc. The probabilities always balance out, no matter how many lines are in the file.

Likewise, we apply this technique to reading files in a directory. We first discard any files that do not match our mask if we supplied a mask. We then apply the algorithm to determine whether to store the current filename. The last filename we happen to store is what we ultimately return.

Now we return to the body of our CGI script and use the extension of the file to determine the media type of our image. Because the media type for JPEG files (image/jpeg) differs from the common extension for JPEGs (.jpg), we convert these.

Next we print our header with the corresponding media type for our image as well as an Expires header to discourage the browser from caching this response. Unfortunately, this header does not always work; we will discuss this further in a moment.

13.2.1.1. binmode

After printing our header, we use Perl's built-in function binmode to indicate that we are outputting binary data. This is important. On Unix systems,binmode does nothing (thus on these systems it can be omitted), but on Windows, MacOS, and other operating systems that do not use a single newline as an end-of-line character, it disables automatic end-of-line translation that may otherwise corrupt binary output.

Finally, we read and output the image data. Note that because it is a binary file, there are no standard line endings, so we must use read instead of <> used on text files.

13.2.2. Including Dynamic Images in HTML

You can include a dynamic image in one of your HTML documents the same way you include standard images: via a URL. For example, the following tag displays a random image using our previous example:

<IMG SRC="/cgi/random_image.cgi" >

13.2.2.1. Redundant path information

Unfortunately, there are some browsers (specifically some versions of Internet Explorer) that sometimes pay more attention to the extension of a resource they are fetching than to the HTTP media type header. According to the HTTP standard, this is wrong of course, and probably an accidental bug, but if you want to accommodate users of these browsers, you may wish to append redundant path information onto URLs to provide an acceptable file extension:

<IMG SRC="/cgi/survey_graph.cgi/survey.png" >

The web server will still execute survey_graph.cgi, which generates the image while ignoring the additional /survey.png path information.

Incidentally, adding false path information like this is a good idea whenever your CGI script is generating content that you expect users to save, because browsers generally default to the filename of the resource they requested, and the user probably would rather the file be saved as survey.png than survey_graph.cgi.

For CGI scripts like random_image.cgi that determine the filename and/or extension dynamically, you can still accomplish this with redirection. For example, we could replace the line that sets $image in random_image.cgi (Example 13-1) with the following lines:

my( $image ) = $q->path_info =~ /(\w+\.\w+)$/;

unless ( defined $image and -e IMAGE_DIRECTORY . "/$image" ) {
    $image = random_file( IMAGE_DIRECTORY, '\\.(png|jpg|gif)$' );
    print $q->redirect( $q->script_name . "/$image" );
    exit;
}

The first time this script is accessed, there is no additional path information, so it fetches a new image from our random_file function and redirects to itself with the filename appended as path information. When this second request arrives, the script retrieves the filename from the path information and uses this if the filename matches our regular expression and it exists. If it isn't a valid filename, the script acts as if no path had been passed and generates a new filename.

Note that our filename regular expression, /(\w+\.\w+)$/, prevents any images in our image directory that have characters not matched by \w from being displayed, including images that contain hyphens. Depending on the filenames you are using, you may need to adjust this pattern.

13.2.2.2. Preventing caching

In Example 13-1, we generated an Expires HTTP header in order to discourage caching. Unfortunately, not all browsers respect this header, so it is quite possible for a user to get a stale image instead of a dynamic one. Some browsers also try to determine whether a resource is generated dynamically by something such as a CGI script or whether it is static; these browsers seem to assume that images are static, especially if you append additional path information as we just discussed.

There is a way to force browsers not to cache images, but this requires that the tag for the image also be dynamically generated. In these circumstances, you can add a value that constantly changes, such as the time in seconds, to the URL:

my $time = time;
print $q->img( { -src => "/cgi/survey_graph.cgi/$time/survey.png" } );

By adding the time to the additional path information, the browser views each request (more than a second apart) as a new resource. However, this technique does fill the browser's cache with duplicate images, so use it sparingly, and always combine this with an Expires header for the sake of browsers that support it. Adding a value like this to the query string also works:

print $q->img( { -src => "/cgi/survey_graph.cgi/survey.png?random=$time" } );

If nothing else on the HTML page is dynamic, and you do not wish to convert it to a CGI script, then you can also accomplish this via a server-side include (see Chapter 6, "HTML Templates"):

<!--#config timefmt="%d%m%y%H%M%S" -->
<IMG SRC="/cgi/survey_graph.cgi/<!--#echo var="DATE_LOCAL"-->/survey.png">

Although this is a little hard to read and is syntactically invalid HTML, the SSI tag will be parsed by an SSI-enabled server and replaced with a number representing the current date and time before it is sent to the user.