home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeWebmaster in a Nutshell, 3rd EditionSearch this book

Chapter 17. HTTP

The Hypertext Transfer Protocol (HTTP) is the language web clients and servers use to communicate with each other. It is essentially the backbone of the World Wide Web. While HTTP is largely the realm of server and client programming, a firm understanding of HTTP is also important for CGI programming. In addition, sometimes HTTP filters back to the users—for example, when server error codes are reported in a browser window.

This chapter covers all the basics of HTTP. For absolutely complete coverage of HTTP and all its surrounding technologies, see HTTP: The Definitive Guide by David Gourley and Brian Totty, with Marjorie Sayer, Sailu Reddy, and Anshu Aggarwal (O'Reilly).

All HTTP transactions follow the same general format. Each client request and server response has three parts: the request or response line, a header section, and the entity body. The client initiates a transaction as follows:

  1. The client contacts the server at a designated port number (by default, 80). It sends a document request by specifying an HTTP command called a method, followed by a document address, and an HTTP version number. For example:

    GET /index.html HTTP/1.1

    This makes use of the GET method to request the document index.html using Version 1.1 of HTTP. HTTP methods are discussed in more detail later in this chapter.

  2. Next, the client sends optional header information to inform the server of its configuration and the document formats it will accept. All header information is given line by line, each with a header name and value. For example, this header information sent by the client indicates its name and version number and specifies several document preferences:

    User-Agent: Mozilla/4.05(WinNT; I)
    Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
    */*

    The client sends a blank line to end the header.

  3. After sending the request and headers, the client may send additional data. This data is mostly used by CGI programs that use the POST method. It may also be used by clients like Netscape Navigator Professional Edition to publish an edited page back onto the web server.

The server responds in the following way to the client's request:

  1. The server replies with a status line containing three fields: HTTP version, status code, and description. The HTTP version indicates the version of HTTP the server is using to respond.The status code is a three-digit number that indicates the server's result of the client's request. The description following the status code is simply human-readable text that describes the status code. For example:

    HTTP/1.1 200 OK

    This status line indicates that the server uses Version 1.1 of HTTP in its response. A status code of 200 means that the client's request was successful, and the requested data will be supplied after the headers.

  2. After the status line, the server sends header information to the client about itself and the requested document. For example:

    Date: Fri, 20 Sep 1998 08:17:58 GMT
    Server: NCSA/1.5.2
    Last-modified: Mon, 17 Jun 1998 21:53:08 GMT
    Content-type: text/html
    Content-length: 2482

    A blank line ends the header.

  3. If the client's request is successful, the requested data is sent. This data may be a copy of a file or the response from a CGI program. If the client's request could not be fulfilled, the additional data may be a human-readable explanation of why the server could not fulfill the request.

In HTTP 1.0, after the server has finished sending the requested data, it disconnects from the client, and the transaction is over unless a Connection: Keep Alive header is sent. Beginning with HTTP 1.1, however, the default is for the server to maintain the connection and allow the client to make additional requests. Since many documents embed other documents (inline images, frames, applets, etc.), this saves the overhead of the client having to repeatedly connect to the same server just to draw a single page. Under HTTP 1.1, therefore, the transaction might cycle back to the beginning, until either the client or server explicitly closes the connection.

Being a stateless protocol, HTTP does not maintain any information from one transaction to the next, so the next transaction needs to start all over again. The advantage is that an HTTP server can serve a lot more clients in a given period of time, since there's no additional overhead for tracking sessions from one connection to the next. The disadvantage is that more elaborate CGI programs need to use hidden input fields (as described in Chapter 6), or external tools such as cookies (described later in this chapter) to maintain information from one transaction to the next.

17.1. Client Requests

Client requests are broken into three sections. The first line of a message always contains an HTTP command called a method, a URI that identifies the file or resource the client is querying, and the HTTP version number. The second section of a client request contains header information, which provides information about the client and the data entity it is sending the server. The third part of a client request is the entity body, the data being sent to the server.

A Uniform Resource Identifier (URI) is a general term for all valid formats of addressing schemes supported on the World Wide Web. The one in common use now is the Uniform Resource Locator (URL) addressing scheme. See Chapter 1 for more information on URLs.

17.1.1. Methods

A method is an HTTP command that begins the first line of a client request. The method tells the server the purpose of the client request. There are three methods defined for HTTP: GET, HEAD, and POST. Other methods are also defined but not as widely supported by servers (although the other methods will be used more often in the future, not less). Methods are case-sensitive, so a "GET" is different from a "get."

17.1.1.1. The GET method

The GET method is a request for information located at a specified URI on the server. It is the most commonly used method by browsers to retrieve information. The result of a GET request can be generated in many different ways; it can be a file accessible by the server, the output of a program or CGI script, the output from a hardware device, etc.

When a client uses the GET method in its request, the server responds with a status line, headers, and the requested data. If the server cannot process the request due to an error or lack of authorization, the server usually sends a textual explanation in the data portion of the response.

The entity-body portion of a GET request is always empty. GET is basically used to say "Give me this file." The file or program the client requests is usually identified by its full pathname on the server.

Here is an example of a successful GET request to retrieve a file. The client sends:

GET /index.html HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/2.02Gold (WinNT; I)
Host: www.oreilly.com
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*

The server responds with:

HTTP/1.0 200 Document follows
Date: Fri, 20 Sep 1998 08:17:58 GMT
Server: NCSA/1.5.2
Last-modified: Mon, 17 Jun 1998 21:53:08 GMT
Content-type: text/html
Content-length: 2482

(body of document here)

The GET method is also used to send input to programs like CGI through form tags. Since GET requests have empty entity-bodies, the input data is appended to the URL in the GET line of the request. When a <form> tag specifies the method="GET" attribute value, key-value pairs representing the input from the form are appended to the URL following a question mark (?). Pairs are separated by an ampersand (&). For example:

GET /cgi-bin/birthday.pl?month=august&date=24 HTTP/1.0

This causes the server to send the birthday.pl CGI program the month and date values specified in a form on the client. The input data at the end of the URL is encoded to CGI specifications. For literal use of special characters, the client uses hexadecimal notation. The character encoding is described in Chapter 12.

The GET method can also supply extra-path information in the same manner. This is achieved by adding the extra path after the URL, i.e., /cgi-bin/ display.pl/cgi/cgi_doc.txt. The server gauges where the program's name ends (display.pl); everything after that is read as the extra path.

17.1.1.2. The HEAD method

The HEAD method is functionally like GET except that the server will not send anything in the data portion of the reply. The HEAD method requests only the header information on a file or resource. The header information from a HEAD request should be the same as that from a GET request.

This method is used when the client wants to find out information about the document and not retrieve it. Many applications exist for the HEAD method. For example, the client may desire the following information:

  • The modification time of a document, useful for cache-related queries

  • The size of a document, useful for page layout, estimating arrival time, or determining whether to request a smaller version of the document

  • The type of the document, to allow the client to examine only documents of a certain type

  • The type of server, to allow customized server queries

It is important to note that most of the header information provided by a server is optional and may not be given by all servers. A good design for web clients is to allow flexibility in the server response and take default actions when desired header information is not given by the server.

The following is an example HTTP transaction using the HEAD request. The client sends:

HEAD /index.html HTTP/1.1
Connection: Keep-Alive
User-Agent: Mozilla/2.02Gold (WinNT; I)
Host: www.oreilly.com
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*

The server responds with:

HTTP/1.1 200 Document follows
Date: Fri, 20 Sep 1998 08:17:58 GMT
Server: NCSA/1.5.2
Last-modified: Mon, 17 Jun 1998 21:53:08 GMT
Content-type: text/html
Content-length: 2482

(No entity body is sent in response to a HEAD request.)



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.