home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeCGI Programming with PerlSearch this book

2.3. Browser Requests

Every HTTP interaction starts with a request from a client, typically a web browser. A user provides a URL to the browser by typing it in, clicking on a hyperlink, or selecting a bookmark, and the browser fetches the corresponding document. To do that, it must create an HTTP request (see Figure 2-4).

Figure 2-4

Figure 2-4. The structure of HTTP request headers

Recall that in our previous example, a web browser generated the following request when it was asked to fetch the URL http://localhost/index.html :

GET /index.html HTTP/1.1
Host: localhost
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/xbm, */*
Accept-Language: en
Connection: Keep-Alive
User-Agent: Mozilla/4.0 (compatible; MSIE 4.5; Mac_PowerPC)
.
.
.

From our discussion of URLs, you know that the URL can be broken down into multiple elements. The browser creates a network connection by using the hostname and the port number (80 by default). The scheme (http) tells our web browser that it is using the HTTP protocol, so once the connection is established, it sends an HTTP request for the resource. The first line of an HTTP request is the request line, which includes a full virtual path and query string (if present); see Figure 2-5.

Figure 2-5

Figure 2-5. The request line

2.3.1. The Request Line

The first line of an HTTP request includes the request method, a URL to the resource being requested, and the version string of the protocol. Request methods are case-sensitive and uppercase. There are several request methods defined by HTTP although a web server may not make all of them available for each resource (see Table 2-1). The version string is the name and version of the protocol separated by a slash. HTTP 1.0 and HTTP 1.1 are represented as HTTP/1.0 and HTTP/1.1. Note that https requests also produce one of these two HTTP protocol strings.

Table 2-1. HTTP Request Methods

Method

Description

GET

Asks the server for the given resource

HEAD

Used in the same cases that a GET is used but it only returns HTTP headers and no content

POST

Asks the server to modify information stored on the server

PUT

Asks the server to create or replace a resource on the server

DELETE

Asks the server to delete a resource on the server

CONNECT

Used to allow secure SSL connections to tunnel through HTTP connections

OPTIONS

Asks the server to list the request methods available for the given resource

TRACE

Asks the server to echo back the request headers as it received them

Of the request methods listed in Table 2-1, the three you will encounter most often when writing CGI scripts are GET, HEAD, and POST. However, let's first take a look at why the PUT and DELETE methods are not used with CGI.

2.3.2. Request Header Field Lines

The client generally sends several header fields with its request. As mentioned earlier, these consist of a field name, a colon, some combination of spaces or tabs (although one space is most common), and a value (see Figure 2-6). These fields are used to pass additional information about the request or about the client, or to add conditions to the request. We'll discuss the common browser headers here; they are listed in Table 2-2. Those connected with content negotiation and caching are discussed later in this chapter.

Figure 2-6

Figure 2-6. A header field line

Table 2-2. Common HTTP Request Headers

Header

Description

Host

Specifies the target hostname

Content-Length

Specifies the length (in bytes) of the request content

Content-Type

Specifies the media type of the request

Authentication

Specifies the username and password of the user requesting the resource

User-Agent

Specifies the name, version, and platform of the client

Referer

Specifies the URL that referred the user to the current resource

Cookie

Returns a name/value pair set by the server on a previous response

2.3.2.4. Authorization

Web servers can require a login for access to some resources. If you have ever attempted to access a restricted area of a web site and been prompted for a login and password, then you have encountered this form of HTTP authentication (see Figure 2-7).[3] Note that the login prompt includes text identifying what you are logging in to; this is the realm . Resources that share the same login are part of the same realm. For most web servers, you assign resources to a realm by putting them in the same directory and configuring the web server to assign the directory a name for the realm along with authorization requirements. For example, if you wanted to restrict access to URL paths that begin with /protected , then you would add the following to httpd.conf (or access.conf, if you are using it):

[3]The distinction between authentication and authorization is subtle, but important. Authentication is the process of identifying someone. Authorization determines what that person can access.

<Location /protected>
  AuthType Basic
  AuthName "The Secret Files"
  AuthUserFile  /usr/local/apache/conf/secret.users
  require valid-user
</Location>
Figure 2-7

Figure 2-7. Prompt presented to the user for HTTP authorization

The user file contains usernames and encrypted passwords separated by a colon. You can use the htpasswd utility that comes with Apache to create and update this file; refer to its manpage or the Apache manual for usage. When the browser requests a resource in a restricted realm, the server informs the browser that it requires login information by sending a 401 status code and the name of the realm in the WWW-Authenticate header (we'll discuss this later in the chapter). The browser then prompts the user for a username and password for this realm (if it hasn't done so already) and resends the request with the credentials in an Authorization field. There are multiple types of HTTP authentication, but the only type that is widely supported by browsers and servers is basic authentication.

The Authorization field for basic authentication looks like this:

Authorization: Basic dXNlcjpwYXNzd29yZA==

The encoded portion is simply the username and password joined with a colon and Base64-encoded. This can be easily decoded, so basic authentication provides no security against third parties sniffing usernames and passwords unless the connection is secured via SSL.

The server handles authentication and authorization transparently for you. As we will see in the next chapter, you may access the login name from your CGI scripts but not the password.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.