20.1 CGI in Python

CGI's standardization lets you use any language to code CGI scripts. Python is a very-high-level, high-productivity language, and thus quite suitable for CGI coding. The Python standard library supplies modules to handle typical CGI-related tasks.

20.1.1 Form Submission Methods

CGI scripts are often used to handle HTML form submissions. In this case, the action attribute of the form tag specifies a URL for a CGI script to handle the form, and the method attribute is either GET or POST, indicating how the form data is sent to the script. According to the CGI standard, the GET method should be used for forms without side effects, such as asking the server to query a database and display the results, while the POST method is meant for forms with side effects, such as asking the server to update a database. In practice, however, GET is also often used to create side effects. The distinction between GET and POST in practical use is that GET encodes the form's contents as a query string joined to the action URL to form a longer URL, while POST transmits the form's contents as an encoded stream of data, which a CGI script sees as the script's standard input.

The GET method is slightly faster. You can use a fixed GET-form URL wherever you can use a hyperlink. However, GET cannot send large amounts of data to the server, since many clients and servers limit URL lengths (you're safe up to about 200 bytes). The POST method has no size limits. You must use POST when the form contains input tags with type=file—the form tag must then have enctype=multipart/form-data.

The CGI standard does not specify whether a single script can access both the query string (used for GET) and the script's standard input (used for POST). Many clients and servers let you get away with it, but relying on this nonstandard practice may negate the portability advantages that you would otherwise get from the fact that CGI is a standard. Python's standard module cgi, covered in the next section, recovers form data from the query string only, when any query string is present; otherwise, when no query string is present, cgi recovers form data from standard input.

20.1.2 The cgi Module

The cgi module supplies several functions and classes, mostly for backward compatibility or unusual needs. CGI scripts use one function and one class from module cgi.

escape

escape(str,quote=0)

Returns a copy of string str, replacing each occurrence of characters &, <, and > with the appropriate HTML entity (&, <, >). When quote is true, escape also replaces double quote characters (") with ". Function escape lets a script prepare arbitrary text strings for output within an HTML document, whether or not the strings contain characters that HTML interprets in special ways.

FieldStorage

class FieldStorage(keep_blank_values=0)

When your script instantiates a FieldStorage instance f, module cgi parses the query string, and/or standard input, as appropriate. You need not determine whether the client used the POST or GET method, as cgi hides the distinction. Your script must instantiate FieldStorage only once, since the instantiation may consume standard input.

An instance f of class FieldStorage is a mapping. f's keys are the name attributes of the form's controls. When keep_blank_values is true, f also includes controls whose values are blank strings. By default, f ignores such controls. f supplies methods f.has_key and f.keys, with normal mapping semantics. The value for each key n, f[n], can be either:

A list of k FieldStorage instances, if name n occurs more than once in the form (k is the number of occurrences of n)
A single FieldStorage instance, if name n occurs exactly once in the form

How often a name occurs in a form depends on HTML form rules. Groups of radio or checkbox controls share a name, but an entire group amounts to just one occurrence of the name.

Values in a FieldStorage instance are in turn FieldStorage instances, to let you handle nested forms. In practice, you don't need such complications. For each nested instance, just access the value (and occasionally other attributes), ignoring potential nested-mapping aspects. Avoid type tests: module cgi can optimize, using instances of MiniFieldStorage, a lightweight signature-compatible class instead of FieldStorage instances. You usually know what name values are repeated in the form, and thus you know which items of f can be lists. When you don't know, find out with try/except, not with type tests (see Section 6.6 in Chapter 6 for details on this idiom).

An instance f of class FieldStorage supplies the following three methods.

getfirst

f.getfirst(key,default=None)

When f.has_key(key), and f[key].value is a single value, not a list of values, getfirst returns f[key].value. When f.has_key(key), and f[key].value is a list of values, getfirst returns f[key].value[0]. When key is not a key in f, getfirst returns default.

Use getfirst when you know that there should be just one input field (or at most one input field) named key in the form from which your script's input comes. getfirst was introduced in Python 2.2, so don't use it if your script must remain compatible with older versions of Python.

getlist

f.getlist(key)

When f.has_key(key), and f[key].value is a single value, not a list of values, getlist returns [f[key].value], i.e., a list whose only item is f[key].value. When f.has_key(key), and f[key].value is a list of values, getlist returns f[key].value. When key is not a key in f, getlist returns the empty list [].

Use getlist when you know that there can be more than one input field named key in the form from which your script's input comes. getlist was introduced in Python 2.2, so don't use it if your script must remain compatible with older versions of Python.

getvalue

f.getvalue(key,default=None)

Like f[key].value when f.has_key(key), otherwise returns default. getvalue is slightly less convenient than methods getfirst or getlist; the only reason to use getvalue is if your script must remain compatible with old versions of Python, since methods getfirst and getlist were introduced in Python 2.2.

An instance f of class FieldStorage supplies the following attributes:

disposition: The Content-Disposition header, or None if no such header is present
disposition_options: A mapping of all the options in the Content-Disposition header, if any
headers: A mapping of all headers, normally an instance of the rfc822.Message class covered in Chapter 21
file: A file-like object from which you can read the control's value, if applicable; None if the value is held in memory as a string, as happens for most controls
filename: The filename as specified by the client, for file controls; otherwise None
name: The name attribute of the control, or None if no such attribute is present
type: The Content-Type header, or None if no such header is present
type_options: A mapping of all the options in the Content-Type header, if any
value: The control's value as a string; if f is keeping the control's value in a file, then f implicitly reads the file into memory each time you access f.value

In most cases, attribute value is all you need. Other attributes are useful for file controls, which may have very large values and metadata such as content type and content disposition headers. checkbox controls that share a name, and multiple-choice select controls, have values that are strings representing comma-separated lists of options. The idiom:

values=f.getfirst(n,'').split(',')

breaks apart such composite value strings into a list of their individual component strings.

20.1.3 CGI Output and Errors

When the server runs a CGI script to meet a request, the response to the request is the standard output of the script. The script must output the HTTP headers it needs, then an empty line, then the response's body. In particular, the script must always output the Content-Type header. Most often, the script outputs the Content-Type header as:

Content-Type: text/html

In this case, the response body must be HTML. However, the script may also choose to output a content type of text/plain (i.e., the response body must be plain text) or any other MIME type followed by a response body conforming to that MIME type. The MIME type must be compatible with the Accept header that the client sent, if any.

Here is the simplest possible Python CGI script in the tradition of "Hello World," ignoring its input and outputting just one line of plain text output:

print "Content-Type: text/plain"
print
print "Hello, CGI World!"

Most often, you want to output HTML, and this is similarly easy:

print "Content-Type: text/html"
print
print "<html><head><title>Hello, HTML</title></head>"
print "<body><p>Hello, CGI and HTML together!</p></body></html>"

Browsers are quite forgiving in parsing HTML: you could get by without the HTML structure tags that this code outputs. However, being fully correct costs little. For other ways to generate HTML output, see Chapter 22.

The web server collects all output from a CGI script, then sends it to the client browser in one gulp. Therefore, you cannot send to the client any progress information, just final results. If you need to output binary data (on a platform where binary and text files differ, such as Windows), you must ensure python is called with the -u switch, covered in Chapter 3. A more robust approach is to text-encode your output, using the encoding modules covered in Chapter 21 (typically with Base-64 encoding) and a suitable Content-Transfer-Encoding header. A standards-compliant browser will then decode your output according to the Content-Transfer-Encoding header and recover the binary data thus encoded.

Such encoding makes your output about 30% larger, which in some cases can give performance problems. In such cases, ensuring that your script's standard output stream is a binary file can be preferable. On Windows, specifically, an alternative to using the -u switch for this purpose is:

import msvcrt, os
msvcrt.setmode(1, os.OS_BINARY)

However, if you can ensure it's used, the -u switch is preferable, since it's cross-platform.

20.1.3.1 Error messages

If exceptions propagate from your script, Python outputs traceback diagnostics to standard error. With most web servers, error information ends up in error logs. The client browser receives a concise generic error message. This may be okay, if you can access the error logs. Seeing detailed error information in the client browser makes your life easier when you debug a CGI script. When you know that a script has bugs and you need an error trace for debugging, you can use a content type of text/plain and redirect standard error to standard output as shown here:

print "Content-Type: text/plain"
print
import sys
sys.stderr = sys.stdout
def witherror(  ):
    return 1/0
print "Hello, CGI with an error!"
print "Trying to divide by 0 produces:",witherror(  )
print "The script does not reach this part..."

If your script fails only occasionally and you want to see HTML-formatted output up to the point of failure, you can use a more sophisticated approach based on the traceback module covered in Chapter 17, as shown here:

import sys
sys.stderr = sys.stdout
import traceback
print "Content-Type: text/html"
print
try:
    def witherror(  ):
        return 1/0
    print "<html><head><title>Hello, traceback</title></head><body>"
    print "<p>Hello, CGI with an error traceback!"
    print "<p>Trying to divide by 0 produces:",witherror(  )
    print "<p>The script does not reach this part..."
except:
    print "<br><strong>ERROR detected:</strong><br><pre>"
    traceback.print_exc(  )
    sys.stderr = sys.__stderr_  _
    traceback.print_exc(  )

After imports, redirection, and content-type output, this example runs the script's substantial part in the try clause of a try/except statement. In the except clause, the script outputs a <br> tag, terminating any current line, and then a <pre> tag to ensure that further line breaks are honored. Function print_exc of module traceback outputs all error information. Lastly, the script restores standard error and outputs error information again. Thus, the information is also in the error logs for later study, not just transiently displayed in the client browser. These refinements are not very useful in this specific example, of course, since the error is repeatable, but they help track down real-life errors.

20.1.3.2 The cgitb module

The simplest way to provide good error reporting in CGI scripts is to use module cgitb. Module cgitb supplies two functions.

handle

handle(exception=None)

Reports an exception's traceback to the browser. exception is a tuple with three items (type,value,tb), just like the result of calling sys.exc_info( ), covered in Chapter 8. When exception is None, handle calls exc_info to get the information about the exception to display.

enable

enable(display=True,logdir=None,context=5)

Installs an exception hook, via sys.excepthook, to diagnose propagated exceptions. The hook displays the exception traceback on the browser if display is true. The hook logs the exception traceback to a file in directory logdir if logdir is not None. In the traceback, the hook shows context lines of source code per frame.

In practice, you can start all of your CGI scripts with:

import cgitb
cgitb.enable(  )

and be assured of good error reporting to the browser with minimal effort on your part. Of course, when you don't want users of your page to see Python tracebacks from your scripts on their browsers, you can call cgitb(False,'/my/log/dir') and get the error reports, with exception tracebacks, as files in directory /my/log/dir instead.

20.1.4 Installing Python CGI Scripts

Installation of CGI scripts depends on the web browser and host platform. A script coded in Python is no different in this respect from scripts coded in other languages. Of course, you must ensure that the Python interpreter and standard library are installed and accessible. On Unix-like platforms, you must set the x permission bits for the script and use a so-called shebang line as the script's first line. For example:

#!/usr/local/bin/python

depending on the details of your platform and Python installation. If you copy or share files between Unix and Windows platforms, make sure the shebang line does not end with a carriage return (\r), which might confuse the shell or web server that parses the shebang line to find out which interpreter to use for your script.

20.1.4.1 Python CGI scripts on Microsoft web servers

If your web server is Microsoft IIS 3 or 4 or Microsoft PWS (Personal Web Server), assign file extensions to CGI scripts via entries in registry path HKLM\System\CurrentControlSet\Services\W3Svc\Parameters\Script_Map. Each value in this path is named by a file extension, such as .pyg (each value's name starts with a period). The value is the interpreter command (e.g., C:\Python22\Python.Exe -u %s %s). You may also use file extensions such as .cgi or .py for this purpose, but I recommend a unique one such as .pyg instead. Assigning Python as the interpreter for all scripts named .cgi might interfere with your ability to use other interpreters for CGI purposes. Having all modules with a .py extension interpreted as CGI scripts is more accident-prone than dedicating a unique extension such as .pyg to this purpose, and may interfere with your ability to have your Python-coded CGI scripts import utility modules from the same directories.

With IIS 5, you can use the Administrative Tools Computer Management applet to associate a file extension with an interpreter command line. This is performed via Services and Applications Internet Information Services. Right-click either on [IISAdmin], for all sites, or on a specific web site, and choose Properties Configuration Add Mappings Add. Enter the extension, such as .pyg, in the Extension field, and the interpreter command line, such as C:\Python22\Python.Exe -u %s %s, in the Executable field.

20.1.4.2 Python CGI scripts on Apache

The popular free web server Apache is configured via directives in a text file (by default, httpd.conf). When the configuration has ScriptAlias entries, such as:

ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/

any executable script in the aliased directory can run as a CGI script. You may also enable CGI execution in a specific directory by using for that directory the Apache directive:

Options +ExecCGI

In this case, to let scripts with a certain extension run as CGI scripts, you may also add a global AddHandler directive, such as:

AddHandler cgi-script pyg

to enable scripts with extension .pyg to run as CGI scripts. Apache determines what interpreter to use for a script by the shebang line at the script's start. Another way to enable CGI scripts in a directory (if global directive AllowOverride Options is set) is to use Options +ExecCGI in a file named .htaccess in that directory.

20.1.4.3 Python CGI scripts on Xitami

The free, lightweight, simple web server Xitami (http://www.xitami.org) makes it easy to install CGI scripts. When any component of a URL is named cgi-bin, Xitami takes the URL as a request for CGI execution. Xitami determines what interpreter to use for a script by the shebang line at the script's start, even on Windows platforms.