Perl is the most commonly used language for CGI programming on the World Wide Web. The Common Gateway Interface (CGI) is an essential tool for creating and managing comprehensive web sites. With CGI, you can write scripts that create interactive, user-driven applications.
CGI allows the web server to communicate with other programs that are running on the same machine. For example, with CGI, the web server can invoke an external program, while passing user-specific data to the program (such as what host the user is connecting from, or input the user has supplied through an HTML form). The program then processes that data, and the server passes the program's response back to the web browser.
Rather than limiting the Web to documents written ahead of time, CGI enables web pages to be created on the fly, based upon the input of users. You can use CGI scripts to create a wide range of applications, from surveys to search tools, from Internet service gateways to quizzes and games. You can increment the number of users who access a document or let them sign an electronic guestbook. You can provide users with all types of information, collect their comments, and respond to them.
For Perl programmers, there are two approaches you can take to CGI. They are:
One performance hit for CGI programs is that the Perl interpreter needs to be started up each and every time a CGI script is called. For improving performance on Apache systems, the mod_perl Apache module embeds the Perl interpreter directly into the server, avoiding the startup overhead. Chapter 11, Web Server Programming with mod_perl , talks about installing and using mod_perl .
For an example of a CGI application, suppose you create a guestbook for your website. The guestbook page asks users to submit their first name and last name using a fill-in form composed of two input text fields. Figure 9.1 shows the form you might see in your browser window.
The HTML that produces this form might read as follows:
The form is written using special "form" tags, as follows:<HTML><HEAD><TITLE>Guestbook</TITLE></HEAD> <BODY> <H1>Fill in my guestbook!</H1> <FORM METHOD="GET" ACTION="/cgi-bin/guestbook.pl"> <PRE> First Name: <INPUT TYPE="TEXT" NAME="firstname"> Last Name: <INPUT TYPE="TEXT" NAME="lastname"> <INPUT TYPE="SUBMIT"> <INPUT TYPE="RESET"> </FORM>
When the user presses the "submit" button, data entered
Parameters to a CGI program are transferred either in the URL
or in the body text of the request. The method used to pass
parameters is determined by the
GET /cgi-bin/guestbook.pl?firstname=Joe&lastname=Schmoe HTTP/1.0
In both of these examples, you should recognize thePOST /cgi-bin/guestbook.pl HTTP/1.0 ... [More headers here] firstname=Joe&lastname=Schmoe
The server now passes the variable=value pairs to the CGI program.
It does this either through Unix environment variables or in
standard input (STDIN).
If the CGI program is called with the GET method, then parameters
are expected to be embedded into the URL of the request, and
the server transfers them to the program by assigning them to the
QUERY_STRING environment variable. The CGI program can then
retrieve the parameters from QUERY_STRING as it would read any
environment variable (for example, from the
Other environment variables defined by the server for CGI store such information as the format and length of the input, the remote host, the user, and various client information. They also store the server name, the communication protocol, and the name of the software running the server. (We provide a list of the most common CGI environment variables later in this chapter.)
The CGI program needs to retrieve the information as appropriate and then process it. The sky's the limit on what the CGI program actually does with the information it retrieves. It might return an anagram of the user's name, or tell her how many times her name uses the letter "t," or it might just compile the name into a list that the programmer regularly sells to telemarketers. Only the programmer knows for sure.
Regardless of what the CGI program does with its input, it's responsible for giving the browser something to display when it's done. It must either create a new document to be served to the browser or point to an existing document. On Unix, programs send their output to standard output (STDOUT) as a data stream that consists of two parts. The first part is either a full or partial HTTP header that (at minimum) describes the format of the returned data (e.g., HTML, ASCII text, GIF, etc.). A blank line signifies the end of the header section. The second part is the body of the output, which contains the data conforming to the format type reflected in the header. For example:
In this case, the only header line generated is Content-type, which gives the media format of the output as HTML (Content-type: text/html <HTML> <HEAD><TITLE>Thanks!</TITLE></HEAD> <BODY><H1>Thanks for signing my guest book!</H1> ... </BODY></HTML>
The server transfers the results of the CGI program back to the browser. The body text is not modified or interpreted by the server in any way, but the server generally supplies additional headers with information such as the date, the name and version of the server, etc.
CGI programs can also supply a complete HTTP header itself, in which case the server does not add any additional headers but instead transfers the response verbatim as returned by the CGI program. The server needs to be configured to allow this behavior; see your server documentation on NPH (no-parsed headers) scripts for more information.
Here is the sample output of a program generating an HTML virtual document, with a complete HTTP header:
The header contains the communication protocol, the date and time of the response, and the server name and version. (HTTP/1.0 200 OK Date: Thursday, 28-June-96 11:12:21 GMT Server: NCSA/1.4.2 Content-type: text/html Content-length: 2041 <HTML> <HEAD><TITLE>Thanks!</TITLE></HEAD> <BODY> <H1>Thanks for signing my guestbook!</H1> ... </BODY> </HTML>