The CGI Specification (MySQL & mSQL)

9.3. The CGI Specification

So what are the exact "set of rules" that enable a CGI program in say, Batavia, Illinois to communicate with a web browser in Outer Mongolia? The official CGI specification along with lots of other nifty CGI information can be found on NCSA's web site at http://hoohoo.ncsa.uiuc.edu/cgi/. However, the reason this chapter exists is so that you don't have to make the long trek to your web browser and look it up yourself.

There are four methods by which CGI passes information between the CGI program and the web server -- and hence to the web client:

Environment variables
Command line
Standard input
Standard output

Using these four methods, the server sends all of the information provided by the client to the CGI program. The CGI program then does its magic and sends the output back to the server where it is forwarded to the client.

NOTE

This information is written with the Apache HTTP server in mind. Apache is the most widely used web server and is available for virtually all platforms, including Windows 9x and Windows NT. However, this information should also apply to all HTTP servers that support CGI. Some of the more proprietary servers, such as those from Microsoft and Netscape, may have additional features or slightly different operation. As the face of the web is still changing at an incredible speed, standards are still in flux and there will undoubtedly be changes. However, CGI itself seems to have somewhat stabilized -- at the expense of being overshadowed by other technologies, such as applets. Any CGI programs you write using this information will almost certainly be supported by most web servers for many years.

When a CGI program is invoked via a form, the most popular interface used, the browser passes the server a long string that begins with the path and name of the CGI program. Following that is various other data called path information, which is passed to the CGI program via the PATH_INFO environment variable (see Figure 9-1). After the path information comes a "?" symbol followed by form data that will be sent to the server using the HTTP GET method. This data will be available to the CGI program through the QUERY_STRING environment variable. Finally, any form data coming from the page itself through a POST form, the most common type, will be sent to the server using the HTTP POST method. This data will be passed to the CGI program through the standard input. A typical string passed from the browser to the server is shown in Figure 9-1. The program named formread , in directory cgi-bin , is invoked by the server with the extra path information extra/information where the query data choice=help is included -- most likely as part of the original URL. Finally the form data itself (the text "CGI programming" entered into a field labeled "keywords") is sent via an HTTP POST.

Figure 9-1. Parts of the string passed from browser to server

9.3.1. Environment Variables

When the server executes a CGI program, the first thing it does is give the program some information to work with in the form of environment variables. Seventeen variables are officially defined in the specification, while a great deal more unofficial ones that are used via the HTTP_ mechanism described later. Your CGI program can access these environment variables just as they would access the environment variables of the shell if the program was run from the command line. In a shell script, for instance, the environment variable FOO could be accessed as $FOO; in Perl it would be $ENV{'FOO'}; in C getenv("FOO"); etc. Listed in Table 9-1 are the variables that are always set -- even if it is to a null value -- by the server. In addition to these variables, information returned by the client in the header of the request is included as variables of the form HTTP_FOO, where FOO is the name of the header. For example, most web browsers include version information in a header labeled USER_AGENT. This can be accessed by your CGI program as the header HTTP_USER_AGENT. Table 9-1 lists the CCGI environment variables.

Table 9-1. The CGI Environment Variables

Environment Variable	Description
CONTENT_LENGTH	The length, in bytes, of the data provided by the `POST` or `PUT` method.
CONTENT_TYPE	The MIME type of any data attached via a `POST` or `PUT` method.
GATEWAY_INTERFACE	The version number of the CGI specification supported by the server.
PATH_INFO	Extra path information provided by the client. For example, in a request of the form http://www.myserver.com/test.cgi/this/is/a/path?field=green,'/this/is/a/path will be the value of the `PATH_INFO` variable.
PATH_TRANSLATED	This is the same as `PATH_INFO` except any translation that is possible, such as expanding "~account" names, is done by the server.
QUERY_STRING	Any information following the "?" in the URL. This is also the information provided in a form if the `REQUEST_METHOD` is `GET`.
REMOTE_ADDR	The IP address of the client making the request.
REMOTE_HOST	The hostname, if available, of the client making the request.
REMOTE_IDENT	If the web server and the client both support identd-style identification, this will be the username of the account making the request.
REQUEST_METHOD	The method which the client used to make the request. For the run-of-the-mill CGI programs of the type we are going to make, this will usually be `POST` or `GET`.
SCRIPT_NAME	The path given by the client to run the script. This can be used for self-referencing URLs, and so that scripts that are linked in different places can react differently depending on their location.
SERVER_NAME	The hostname -- or IP number, if the hostname is not available -- of the machine on which the web server is running.
SERVER_PORT	The port number the web server is using.
SERVER_PROTOCOL	Protocol by which the client is communicating with the server. For our purposes, it will almost always be HTTP.
SERVER_SOFTWARE	Version information for the web server executing the CGI program.

The following is an example CGI script in Perl which prints out all of the environment variables set by the server -- as well as any inherited variables, such as PATH, which are set by the shell that executed the server.

#!/usr/bin/perl -w

print <<HTML;
Content-type: text/html\n\n

<HTML><HEAD><TITLE></title></head><BODY>
<p>Environment Variables
<p>
HTML

foreach (keys %ENV) { print "$_: $ENV{$_}<br>\n"; }

print <<HTML;
</body></html>
HTML

Any of these variables can be used, even manipulated by your CGI program. However, none of the changes affect the web server which spawned your program.

9.3.2. Command Line

A little used feature of CGI allows arguments to be passed as command line parameters to a CGI program. The reason the feature is little used is because there are only a few practical applications, so we won't dwell on it here. Basically, if the QUERY_STRING environment variable does not contain an "=" symbol, the CGI program will be executed with the command line arguments as the QUERY_STRING. For instance, http://www.myserver.com/cgi-bin/finger?root will execute finger root on www.myserver.com.

Command line parameters are most commonly used in conjunction with the <ISINDEX> HTML tag. The <ISINDEX> tag is a miniform contained in a single tag. When a browser encounters an <ISINDEX> tag, it displays a text box in which the user can enter a query string. Upon submission -- usually after the user presses the "Enter" key -- the browser extracts a URL from the <ISINDEX> tag and calls it, passing the words of the query string as the command line.

For example, the finger CGI mentioned earlier could be written so that, if called with no arguments, it outputs an HTML page that contains an <ISINDEX> tag. The user would then enter an address into the field and the finger would be executed as described above.

9.3.3. Standard Input

As mentioned above, if a client sends information via a PUT or POST HTTP request, the length and MIME type of that information are put into the CONTENT_LENGTH and CONTENT_TYPE environment variables, respectively. The actual data is sent into the CGI program's standard input. No end-of-data marker is necessarily sent to the program, so it must examine the CONTENT_LENGTH variable and read only that number of bytes. This is the primary method of transferring form data from forms and we will use it almost exclusively in our examples.

Many libraries exist for almost all imaginable languages that perform the essential set-up tasks of a CGI program for you, including determining whether the incoming data was sent via the GET or POST method and either parsing the QUERY_STRING environment variable or reading the standard input, respectively. These libraries then place the data into easily accessible variables. A couple of the more common libraries are listed below. For the purely biased reason that we don't know every language out there, we will go into detail only for libraries that work with Perl and C. However, CGI can be very powerful in just about any language. An extensive list of CGI resources for various languages can be found on Yahoo at http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/CGI___Common_Gateway_Interface/.

9.3.4. Accepting Input in Perl

Most of the rest of this section contains examples in Perl and C. This does not mean that Perl and C are any better, or worse, than any others but simply that it has been found very useful by many people in the area of CGI. In particular, because of the popularity of Perl in this area, we still do the vast majority of our CGI work in it. We would, however, also strongly recommend you take a look at Python if you have not yet made a language decision for CGI programs.

Two major libraries provide CGI interfaces for Perl. The first is cgi-lib.pl. [12] The cgi-lib.pl utility is very common because for a while it was the only major library available. It is designed to work under Perl 4, but still works under Perl 5. The other library, CGI.pm,[13] is more recent and in many ways supersedes cgi-lib.pl. CGI.pm is written for Perl 5 and uses an entirely object-oriented scheme for dealing with CGI data. The CGI.pm module parses the standard input and QUERY_STRING variable and stores data in a CGI object. Your program needs only to create a new CGI object and use simple methods like param() to retrieve the data in which you are interested. Example 9-1 is a short example that shows how CGI.pm interprets data. All of the Perl examples in this section will use CGI.pm.

[12]http://www.bio.cam.ac.uk/cgi-lib/

[13]http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html

Example 9-1. Parsing CGI Data in Perl

#!/usr/bin/perl -w

use CGI qw(:standard);  # Use the CGI.pm module. The qw(:standard) imports the
                        # namespace of the standard CGI functions to allow for
                        # clearer code. This can only be done if only one CGI
                        # object will be used throughout the script.

$mycgi = new CGI; # Create a CGI object, which will be our 'gateway' to the form
                  # data.

@fields = $mycgi->param; # This retrieves the names of the all of the form fields
                         # entered.


print header, start_html('CGI.pm test'); # The 'header' and 'start_html' methods
                              'header'
                              # prints out the required HTTP header, and 
                              #'start_html' prints out the HTML header with the 
                              #title given, along with the <BODY> tag.
print "<p>Form information:<br>";


foreach (@fields) { print $_, ":", $mycgi->param($_), "<br>"; }
# For each of the fields, print out the field name along with the value (which
# is obtained through $mycgi->param('fieldname').

print end_html; # A shortcut provided to print the "</body></html>" ending tags.

9.3.5. Accepting Input in C

Since the primary MySQL and mSQL APIs are written in C, we will not completely abandon it for Perl, but instead provide a few C examples where appropriate. There are three widely used C libraries for CGI programming: cgic by Tom Boutell;[14] cgihtml by Eugene Kim;[15] and libcgi from EIT.[16] We have found cgic to be the most complete and easiest to use. However, it lacks the ability to list all of the form variables if you do not know them beforehand. This ability can actually be added by means of a trivial patch, but that is beyond the scope of this chapter. Thus, to mimic the example Perl script used earlier we use the cgihtml library in Example 9-1.

[14]http://www.boutell.com/cgic/

[15]http://hcs.harvard.edu/~eekim/web/cgihtml/

[16]http://wsk.eit.com/wsk/dist/doc/libcgi/libcgi.html

Example 9-1. Parsing CGI Data in C

/* cgihtmltest.c - A generic CGI program to print out the keys and values
	of the submitted form data.
*/

#include <stdio.h>
#include "cgi-lib.h"  /* This contains all of the definitions for the CGI 
                      functions */
#include "html-lib.h" /* This contains all of the definitions for the HTML
                      helper functions */

void print_all(llist l)
/* This functions prints out all of the data submitted by the form in the
same format as the above Perl example. Cgihtml also provides a built-in
function, print_entries(), which does the exact same thing using a set
HTML definition list format.
*/
{
  node* window; 
/* The 'node' type is defined by the cgihtml library and refers to the 
linked list which stores all of the form data. */

  window = l.head; 
/* This sets a pointer at the beginning of the form data */

  while (window != NULL) {
/* Go through the linked list until you reach the last (the first empty) entry */

    printf("  %s:%s<br>\n",window->entry.name,
        replace_ltgt(window->entry.value));
/* Print out the data. Replace_ltgt() is a provided function which HTML encodes
the text so that it will show up correctly on the client browser. */

    window = window->next;
/* Go to the next entry in the list. */

  }
}


int main()
{
  llist entries; /* This is a pointer to the parsed data */
  int status; /* This is a status integer provided by the library */

  html_header(); 
/* This is an HTML-helper function which prints the HTML header */

  html_begin("cgihtml test");
/* This is an HTML-helper function which prints the beginning of the HTML
page with the specified title. */

  status = read_cgi_input(&entries);
/* This reads in and parses the form data */
  printf("<p>Form information:<br>");
  print_all(entries);
/* Call the print_all() function defined above. */
  html_end();
/* This is an HTML-helper function which prints the end of the HTML page. */
  list_clear(&entries);
/* This frees the memory used by the form data. */
  return 0;
}

9.3.6. Standard Output

Data sent by the CGI program to the standard output will be read by the web server and sent to the client. If the name of the script begins with nph-, the data is sent straight to the client without any interference from the web server. In this case, it is up to the CGI program to provide a valid HTTP header that will be understood by the client. Otherwise, let the web server create the HTTP header for you.

Even if you do not use an nph- script, you must still give the server one directive which tells it something about your output. Most commonly, this will be a Content-Type HTTP header, but it could also be a Location header. The headers should be followed by a blank line -- that is, a bare linefeed or CR/LF combination.

The Content-Type header tells the server what type of data is being output by your CGI program. If the output is an HTML page, the line should be Content-Type:text/html. The Location header tells the server the name of another URL -- or another path on the same server -- to which to direct the client. It is of the form Location:http://www.myserver.com/another/place/.

After the HTTP headers and the blank line, you can send the body of your program's output, whether it be an HTML page, an image, plain text, or whatever. Among the CGI programs included with the Apache web server, the nph-test-cgi and test-cgi effectively show the difference between the nph and non-nph style headers, respectively.

In this section, we will be using libraries such as CGI.pm and cgic that provide functions for printing out the HTTP as well as the HTML headers. This will allow you to concentrate on generating the content itself. These helper functions are demonstrated in the examples earlier in this chapter.