An HTTP Transaction (Perl & LWP)

2.2.1. Request

An HTTP request has three parts: the request line, the headers, and the body of the request (normally used to pass form parameters).

The request line says what the client wants to do (the method), what it wants to do it to (the path), and what protocol it's speaking. Although the HTTP standard defines several methods, the most common are GET and POST. The path is part of the URL being requested (in Example 2-1 the path is /daily/2001/01/05/1.html). The protocol version is generally HTTP/1.1.

Each header line consists of a key and a value (for example, User-Agent: SuperDuperBrowser/14.6). In versions of HTTP previous to 1.1, header lines were optional. In HTTP 1.1, the Host: header must be present, to name the server to which the browser is talking. This is the "server" part of the URL being requested (e.g., www.suck.com). The headers are terminated with a blank line, which must be present regardless of whether there are any headers.

The optional message body can contain arbitrary data. If a body is sent, the request's Content-Type and Content-Length headers help the server decode the data. GET queries don't have any attached data, so this area is blank (that is, nothing is sent by the browser). For our purposes, only POST queries use this third part of the HTTP request.

The following are the most useful headers sent in an HTTP request.

Host: www.youthere.int: This mandatory header line tells the server the hostname from the URL being requested. It may sound odd to be telling a server its own name, but this header line was added in HTTP 1.1 to deal with cases where a single HTTP server answers requests for several different hostnames.
User-Agent: Thing/1.23 details...: This optional header line identifies the make and model of this browser (virtual or otherwise). For an interactive browser, it's usually something like Mozilla/4.76 [en] (Win98; U) or Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC). By default, LWP sends a User-Agent header of libwww-perl/5.64 (or whatever your exact LWP version is).
Referer: http://www.thingamabob.int/stuff.html: This optional header line tells the remote server the URL of the page that contained a link to the page being requested.
Accept-Language: en-US, en, es, de

2.2.2. Response

The server's response also has three parts: the status line, some headers, and an optional body.

The status line states which protocol the server is speaking, then gives a numeric status code and a short message. For example, "HTTP/1.1 404 Not Found." The numeric status codes are grouped—200-299 are success, 400-499 are permanent failures, and so on. A full list of HTTP status codes is given in Appendix B, "HTTP Status Codes".

The header lines let the server send additional information about the response. For example, if authentication is required, the server uses headers to indicate the type of authentication. The most common header—almost always present for both successful and unsuccessful requests—is Content-Type, which helps the browser interpret the body. Headers are terminated with a blank line, which must be present even if no headers are sent.

Many responses contain a Content-Length line that specifies the length, in bytes, of the body. However, this line is rarely present on dynamically generated pages, and because you never know which pages are dynamically generated, you can't rely on that header line being there.

(Other, rarer header lines are used for specifying that the content has moved to a given URL, or that the server wants the browser to send HTTP cookies, and so on; however, these things are generally handled for you automatically by LWP.)

The body of the response follows the blank line and can be any arbitrary data. In the case of a typical web request, this is the HTML document to be displayed. If an error occurs, the message body doesn't contain the document that was requested but usually consists of a server-generated error message (generally in HTML, but sometimes not) explaining the error.

2.2. An HTTP Transaction

Example 2-1. An HTTP request

Example 2-2. A successful HTTP response

Example 2-3. An unsuccessful HTTP response

2.2.1. Request

2.2.2. Response