20.1 CGI in Python
CGI's standardization lets you use any language to
code CGI scripts. Python is a very-high-level, high-productivity
language, and thus quite suitable for CGI coding. The Python standard
library supplies modules to handle typical CGI-related tasks.
20.1.1 Form Submission Methods
CGI scripts are often used to
handle HTML form submissions. In this case, the
action attribute of the form
tag specifies a URL for a CGI script to handle the form, and the
method attribute is either GET
or POST, indicating how the form data is sent to
the script. According to the CGI standard, the GET method should be
used for forms without side effects, such as asking the server to
query a database and display the results, while the POST method is
meant for forms with side effects, such as asking the server to
update a database. In practice, however, GET is also often used to
create side effects. The distinction between GET and POST in
practical use is that GET encodes the form's
contents as a query string joined to the action
URL to form a longer URL, while POST transmits the
form's contents as an encoded stream of data, which
a CGI script sees as the script's standard input.
The GET method is slightly faster. You can use a fixed GET-form URL
wherever you can use a hyperlink. However, GET cannot send large
amounts of data to the server, since many clients and servers limit
URL lengths (you're safe up to about 200 bytes). The
POST method has no size limits. You must use POST when the form
contains input tags with
type=file—the form tag
must then have enctype=multipart/form-data.
The CGI standard does not specify whether a single script can access
both the query string (used for GET) and the
script's standard input (used for POST). Many
clients and servers let you get away with it, but relying on this
nonstandard practice may negate the portability advantages that you
would otherwise get from the fact that CGI is a standard.
Python's standard module cgi,
covered in the next section, recovers form data from the query string
only, when any query string is present; otherwise, when no query
string is present, cgi recovers form data from
standard input.
20.1.2 The cgi Module
The cgi module
supplies several functions and classes, mostly for backward
compatibility or unusual needs. CGI scripts use one function and one
class from module cgi.
Returns a copy of string str, replacing
each occurrence of characters &,
<, and > with the
appropriate HTML entity (&,
<, >). When
quote is true, escape
also replaces double quote characters (") with
". Function escape
lets a script prepare arbitrary text strings for output within an
HTML document, whether or not the strings contain characters that
HTML interprets in special ways.
class FieldStorage(keep_blank_values=0)
|
|
When your script instantiates a
FieldStorage instance
f, module cgi parses
the query string, and/or standard input, as appropriate. You need not
determine whether the client used the POST or GET method, as
cgi hides the distinction. Your script must
instantiate FieldStorage only once, since the
instantiation may consume standard input.
An instance f of class
FieldStorage is a mapping.
f's keys are the
name attributes of the form's
controls. When keep_blank_values is true,
f also includes controls whose values are
blank strings. By default, f ignores such
controls. f supplies methods
f.has_key and
f.keys, with normal
mapping semantics. The value for each key
n,
f[n],
can be either:
A list of k
FieldStorage instances, if name
n occurs more than once in the form
(k is the number of occurrences of
n)
A single FieldStorage instance, if
name n occurs exactly
once in the form
How often a name occurs in a form depends on HTML
form rules. Groups of radio or
checkbox controls share a name,
but an entire group amounts to just one occurrence of the name.
Values in a FieldStorage instance are in turn
FieldStorage instances, to let you handle nested
forms. In practice, you don't need such
complications. For each nested instance, just access the value (and
occasionally other attributes), ignoring potential nested-mapping
aspects. Avoid type tests: module cgi can
optimize, using instances of MiniFieldStorage, a
lightweight signature-compatible class instead of
FieldStorage instances. You usually know what
name values are repeated in the form, and thus you
know which items of f can be lists. When
you don't know, find out with
try/except, not with type tests
(see Section 6.6 in
Chapter 6 for details on this idiom).
An instance f of class
FieldStorage supplies the following three methods.
f.getfirst(key,default=None)
|
|
When
f.has_key(key),
and
f[key].value
is a single value, not a list of values, getfirst
returns
f[key].value.
When
f.has_key(key),
and
f[key].value
is a list of values, getfirst returns
f[key].value[0].
When key is not a key in
f, getfirst returns
default.
Use getfirst when you know that there should be
just one input field (or at most one input field) named
key in the form from which your
script's input comes. getfirst
was introduced in Python 2.2, so don't use it if
your script must remain compatible with older versions of Python.
When
f.has_key(key),
and
f[key].value
is a single value, not a list of values, getlist
returns
[f[key].value],
i.e., a list whose only item is
f[key].value.
When
f.has_key(key),
and
f[key].value
is a list of values, getlist returns
f[key].value.
When key is not a key in
f, getlist returns the
empty list [].
Use getlist when you know that there can be more
than one input field named key in the form
from which your script's input comes.
getlist was introduced in Python 2.2, so
don't use it if your script must remain compatible
with older versions of Python.
f.getvalue(key,default=None)
|
|
Like
f[key].value
when
f.has_key(key),
otherwise returns default.
getvalue is slightly less convenient than methods
getfirst or getlist; the only
reason to use getvalue is if your script must
remain compatible with old versions of Python, since methods
getfirst and getlist were
introduced in Python 2.2.
An instance f of class
FieldStorage supplies the following attributes:
- disposition
-
The Content-Disposition header, or None if no such
header is present
- disposition_options
-
A mapping of all the options in the Content-Disposition header, if any
- headers
-
A mapping of all headers, normally an instance of the
rfc822.Message class covered in Chapter 21
- file
-
A file-like object from which you can read the
control's value, if applicable;
None if the value is held in memory as a string,
as happens for most controls
- filename
-
The filename as specified by the client, for file
controls; otherwise None
- name
-
The name attribute of the control, or
None if no such attribute is present
- type
-
The Content-Type header, or None if no such header
is present
- type_options
-
A mapping of all the options in the Content-Type header, if any
- value
-
The control's value
as a string; if f is keeping the
control's value in a file, then
f implicitly reads the file into memory
each time you access
f.value
In most cases, attribute value is all you need.
Other attributes are useful for file controls,
which may have very large values and metadata such as content type
and content disposition headers. checkbox controls
that share a name, and multiple-choice
select controls, have values that are strings
representing comma-separated lists of options. The idiom:
values=f.getfirst(n,'').split(',') breaks apart such composite value strings into a list of their
individual component strings.
20.1.3 CGI Output and Errors
When the server runs a CGI script to
meet a request, the response to the request is the standard output of
the script. The script must output the HTTP headers it needs, then an
empty line, then the response's body. In particular,
the script must always output the Content-Type header. Most often,
the script outputs the Content-Type header as:
Content-Type: text/html
In this case, the response body must be HTML. However, the script may
also choose to output a content type of text/plain
(i.e., the response body must be plain text) or any other MIME type
followed by a response body conforming to that MIME type. The MIME
type must be compatible with the Accept header that the client sent,
if any.
Here is the simplest possible Python CGI script in the tradition of
"Hello World," ignoring its input
and outputting just one line of plain text output:
print "Content-Type: text/plain"
print
print "Hello, CGI World!"
Most often, you want to output HTML, and this is similarly easy:
print "Content-Type: text/html"
print
print "<html><head><title>Hello, HTML</title></head>"
print "<body><p>Hello, CGI and HTML together!</p></body></html>"
Browsers are quite forgiving in parsing HTML: you could get by
without the HTML structure tags that this code outputs. However,
being fully correct costs little. For other ways to generate HTML
output, see Chapter 22.
The web server collects all output from a CGI script, then sends it
to the client browser in one gulp. Therefore, you cannot send to the
client any progress information, just final results. If you need to
output binary data (on a platform where binary and text files differ,
such as Windows), you must ensure python is
called with the -u switch, covered in Chapter 3. A more robust approach is to text-encode your
output, using the encoding modules covered in Chapter 21 (typically with Base-64 encoding) and a
suitable Content-Transfer-Encoding header. A standards-compliant
browser will then decode your output according to the
Content-Transfer-Encoding header and recover the binary data thus
encoded.
Such encoding makes your output about 30% larger, which in some cases
can give performance problems. In such cases, ensuring that your
script's standard output stream is a binary file can
be preferable. On Windows, specifically, an alternative to using the
-u switch for this purpose is:
import msvcrt, os
msvcrt.setmode(1, os.OS_BINARY)
However, if you can ensure it's used, the
-u switch is preferable, since
it's cross-platform.
20.1.3.1 Error messages
If
exceptions propagate from your script, Python outputs traceback
diagnostics to standard error. With most web servers, error
information ends up in error logs. The client browser receives a
concise generic error message. This may be okay, if you can access
the error logs. Seeing detailed error information in the client
browser makes your life easier when you debug a CGI script. When you
know that a script has bugs and you need an error trace for
debugging, you can use a content type of
text/plain and redirect standard error to standard
output as shown here:
print "Content-Type: text/plain"
print
import sys
sys.stderr = sys.stdout
def witherror( ):
return 1/0
print "Hello, CGI with an error!"
print "Trying to divide by 0 produces:",witherror( )
print "The script does not reach this part..."
If your script fails only occasionally and you want to see
HTML-formatted output up to the point of failure, you can use a more
sophisticated approach based on the traceback
module covered in Chapter 17, as shown here:
import sys
sys.stderr = sys.stdout
import traceback
print "Content-Type: text/html"
print
try:
def witherror( ):
return 1/0
print "<html><head><title>Hello, traceback</title></head><body>"
print "<p>Hello, CGI with an error traceback!"
print "<p>Trying to divide by 0 produces:",witherror( )
print "<p>The script does not reach this part..."
except:
print "<br><strong>ERROR detected:</strong><br><pre>"
traceback.print_exc( )
sys.stderr = sys.__stderr_ _
traceback.print_exc( )
After imports, redirection, and content-type output, this example
runs the script's substantial part in the
try clause of a
try/except statement. In the
except clause, the script outputs a
<br> tag, terminating any current line, and
then a <pre> tag to ensure that further line
breaks are honored. Function print_exc of module
traceback outputs all error information. Lastly,
the script restores standard error and outputs error information
again. Thus, the information is also in the error logs for later
study, not just transiently displayed in the client browser. These
refinements are not very useful in this specific example, of course,
since the error is repeatable, but they help track down real-life
errors.
20.1.3.2 The cgitb module
The simplest
way to provide good error reporting in CGI scripts is to use module
cgitb. Module cgitb supplies
two functions.
Reports an exception's traceback to the browser.
exception is a tuple with three items
(type,value,tb),
just like the result of calling sys.exc_info( ),
covered in Chapter 8. When
exception is None,
handle calls exc_info to get
the information about the exception to display.
enable(display=True,logdir=None,context=5)
|
|
Installs an exception hook, via sys.excepthook, to
diagnose propagated exceptions. The hook displays the exception
traceback on the browser if display is
true. The hook logs the exception traceback to a file in directory
logdir if
logdir is not None. In
the traceback, the hook shows context
lines of source code per frame.
In practice, you can start all of your CGI scripts with:
import cgitb
cgitb.enable( ) and be assured of good error reporting to the browser with minimal
effort on your part. Of course, when you don't want
users of your page to see Python tracebacks from your scripts on
their browsers, you can call
cgitb(False,'/my/log/dir') and get the error
reports, with exception tracebacks, as files in directory
/my/log/dir instead.
20.1.4 Installing Python CGI Scripts
Installation of CGI scripts depends
on the web browser and host platform. A script coded in Python is no
different in this respect from scripts coded in other languages. Of
course, you must ensure that the Python interpreter and standard
library are installed and accessible. On Unix-like platforms, you
must set the x permission bits for the script and
use a so-called shebang line as the script's first
line. For example:
#!/usr/local/bin/python
depending on the details of your platform and Python installation. If
you copy or share files between Unix and Windows platforms, make sure
the shebang line does not end with a carriage return
(\r), which might confuse the shell or web server
that parses the shebang line to find out which interpreter to use for
your script.
20.1.4.1 Python CGI scripts on Microsoft web servers
If your web server is Microsoft IIS 3
or 4 or Microsoft PWS (Personal Web Server), assign file extensions
to CGI scripts via entries in registry path
HKLM\System\CurrentControlSet\Services\W3Svc\Parameters\Script_Map.
Each value in this path is named by a file extension, such as
.pyg (each value's name starts
with a period). The value is the interpreter command (e.g.,
C:\Python22\Python.Exe -u %s
%s). You may also use file extensions such as
.cgi or .py for this
purpose, but I recommend a unique one such as
.pyg instead. Assigning Python as the
interpreter for all scripts named .cgi might
interfere with your ability to use other interpreters for CGI
purposes. Having all modules with a .py
extension interpreted as CGI scripts is more accident-prone than
dedicating a unique extension such as .pyg to
this purpose, and may interfere with your ability to have your
Python-coded CGI scripts import utility modules from the same
directories.
With IIS 5, you can use the Administrative Tools
Computer Management applet to associate a file extension with an
interpreter command line. This is performed via Services and
Applications Internet Information Services.
Right-click either on [IISAdmin], for all sites, or on a specific web
site, and choose Properties Configuration
Add Mappings Add. Enter the
extension, such as .pyg, in the Extension field,
and the interpreter command line, such as
C:\Python22\Python.Exe -u %s %s, in the Executable
field.
20.1.4.2 Python CGI scripts on Apache
The popular free web server Apache is
configured via directives in a text file (by default,
httpd.conf). When the configuration has
ScriptAlias entries, such as:
ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/
any executable script in the aliased directory can run as a CGI
script. You may also enable CGI execution in a specific directory by
using for that directory the Apache directive:
Options +ExecCGI
In this case, to let scripts with a certain extension run as CGI
scripts, you may also add a global AddHandler
directive, such as:
AddHandler cgi-script pyg
to enable scripts with extension .pyg to run as
CGI scripts. Apache determines what interpreter to use for a script
by the shebang line at the script's start. Another
way to enable CGI scripts in a directory (if global directive
AllowOverride Options is set)
is to use Options +ExecCGI in a file named
.htaccess in that directory.
20.1.4.3 Python CGI scripts on Xitami
The free, lightweight,
simple web server Xitami (http://www.xitami.org) makes it easy to
install CGI scripts. When any component of a URL is named
cgi-bin, Xitami takes the URL as a request for
CGI execution. Xitami determines what interpreter to use for a script
by the shebang line at the script's start, even on
Windows platforms.
|