Chapter 12. CGI Overview
CGI allows the web server to communicate with other programs that are running on the server. For example, with CGI, the web server can invoke an external program, while passing user-specific data to the program (such as what host the user is connecting from, or input the user has supplied through an HTML form). The program then processes that data, and the server passes the program's response back to the web browser.
Rather than limiting the Web to documents written ahead of time, CGI enables web pages to be created on the fly, based upon the input of users. You can use CGI scripts to create a wide range of applications, from surveys to search tools, from Internet service gateways to quizzes and games. You can count the number of users who access a document or let them sign an electronic guestbook. You can provide users with all types of information, collect their comments, and respond.
This section provides a reference for the essential components of CGI. For a comprehensive treatment of CGI programming we recommend O'Reilly's CGI Programming with Perl by Scott Guelich, Shishir Gundavaram and Gunther Birznieks.
This chapter through Chapter 15 cover the following CGI topics:
12.1. A Typical CGI Interaction
For an example of a CGI application, suppose you create a guestbook for your web site. The guestbook page asks users to submit their first name and last name using a fill-in form composed of two input text fields. Figure 12-1 shows the form you might see in your browser window.
Figure 12-1. HTML form
The HTML that produces this form might read as follows:
<HTML><HEAD><TITLE>Guestbook</TITLE></HEAD> <BODY> <H1>Fill in my guestbook!</H1> <FORM METHOD="GET" ACTION="/cgi-bin/guestbook.pl"> <PRE> First Name: <INPUT TYPE="TEXT" NAME="firstname"> Last Name: <INPUT TYPE="TEXT" NAME="lastname"> <INPUT TYPE="SUBMIT"> <INPUT TYPE="RESET"> </FORM>
The form is written using special "form" tags (discussed in detail in Chapter 6):
When the user presses the Submit button, data entered into the <input> text fields is passed to the CGI program specified by the action attribute of the <form> tag (in this case, the /cgi-bin/guestbook.pl program).
12.1.1. Transferring the Form Data
Parameters to a CGI program are transferred either in the URL or in the body text of the request. The method used to pass parameters is determined by the method attribute to the <form> tag. The GET method says to transfer the data within the URL itself; for example, under the GET method, the browser might initiate the HTTP transaction as follows:
GET /cgi-bin/guestbook.pl?firstname=Joe&lastname=Schmoe HTTP/1.1
See Chapter 17 for more information on HTTP transactions.
The POST method says to use the body portion of the HTTP request to pass parameters. The same transaction with the POST method would read as follows:
POST /cgi-bin/guestbook.pl HTTP/1.1 ... [More headers here] firstname=Joe&lastname=Schmoe
In both examples, you should recognize the firstname and lastname variable names that were defined in the HTML form, coupled with the values entered by the user. An ampersand (&) is used to separate the variable=value pairs.
The server now passes the variable=value pairs to the CGI program. It does this either through Unix environment variables or in standard input (STDIN). If the CGI program is called with the GET method, parameters are expected to be embedded in the URL of the request, and the server transfers them to the program by assigning them to the QUERY_STRING environment variable. The CGI program can then retrieve the parameters from QUERY_STRING as it would read any environment variable (for example, from the %ENV associative array in Perl). If the CGI program is called with the POST method, parameters are expected to be embedded into the body of the request, and the server passes the body text to the program as standard input.
(Other environment variables defined by the server for CGI programs are listed later in this chapter. These variables store such information as the format and length of the input, the remote host, the user, and various client information. They also store the server name, the communication protocol, and the name of the software running the server.)
The CGI program needs to retrieve the information as appropriate and then process it. The sky's the limit on what the CGI program actually does with the information it retrieves. It might return an anagram of the user's name, or tell them how many times their name uses the letter "t," or it might just compile the name into a list that the programmer regularly sells to telemarketers. Only the programmer knows for sure.
12.1.2. Creating Virtual Documents
Regardless of what the CGI program does with its input, it's responsible for giving the browser something to display when it's done. It must either create a new document to be served to the browser or point to an existing document. On Unix, programs send their output to standard output (STDOUT) as a data stream that consists of two parts. The first part is either a full or partial HTTP header that (at minimum) describes the format of the returned data (e.g., HTML, ASCII text, GIF, etc.). A blank line signifies the end of the header section. The second part is the body of the output, which contains the data conforming to the format type reflected in the header. For example:
Content-type: text/html <HTML> <HEAD><TITLE>Thanks!</TITLE></HEAD> <BODY><H1>Thanks for signing my guest book!</H1> ... </BODY></HTML>
In this case, the only header line generated is Content-type, which gives the media format of the output as HTML (text/html). This line is essential for every CGI program, since it tells the browser what kind of format to expect. The blank line separates the header from the body text (which, in this case, is in HTML format as advertised). See Chapter 17 for a listing of other media formats that are commonly recognized on the Web.
Notice that it does not matter to the web server what language the CGI program is written in. On Unix platforms, the most popular language for CGI programming is Perl. Other languages used on Unix are C, C++, Tcl, and Python. On Macintosh computers, programmers use Applescript and C/C++, and on Microsoft Windows, programmers use Visual Basic, Perl, and C/C++. As long as there's a way in a programming language to get data from the server and send data back, you can use it for CGI.
The server transfers the results of the CGI program back to the browser. The body text is not modified or interpreted by the server in any way, but the server generally supplies additional headers with information such as the date, the name and version of the server, etc. See Chapter 17 for a list of valid HTTP response headers.
CGI programs can also supply a complete HTTP header itself, in which case the server does not add any additional headers but instead transfers the response verbatim as returned by the CGI program. (The server may need to be configured to allow this behavior.)
Here is the sample output of a program generating an HTML virtual document, with a complete HTTP header:
HTTP/1.1 200 OK Date: Thursday, 28-June-96 11:12:21 GMT Server: Apache/2.0.36 Content-type: text/html Content-length: 2041 <HTML> <HEAD><TITLE>Thanks!</TITLE></HEAD> <BODY> <H1>Thanks for signing my guestbook!</H1> ... </BODY> </HTML>
The header contains the communication protocol, the date and time of the response, and the server name and version. (The 200 OK is a status code generated by the HTTP protocol to communicate the status of a request, in this case successful. See Chapter 17 for a list of valid HTTP status codes.) Most importantly, it also contains the content type and the number of characters (equivalent to the number of bytes) of the enclosed data.
As seen in Figure 12-2, the result is that after users click the Submit button, they see the message contained in the HTML section of the response thanking them for signing the guestbook.
Figure 12-2. Guestbook acknowledgment
Copyright © 2003 O'Reilly & Associates. All rights reserved.