Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 10 - C API Reference Guide, Part I / Processing Requests
Reading the Request Body

Apache automatically reads all the incoming request header fields, stopping at the carriage-return/linefeed pair that terminates the HTTP header. This information is used to set up the request_rec, server_rec, and connection_rec structures. The server will not automatically read the request body, the optional portion of the request which may contain fill-out form fields or uploaded documents.

Many custom handlers will be able to do their work directly from the information stored in the request_rec and server_rec. The exception is content handlers, which frequently need to process the incoming request body submitted by the POST, PUT, and possibly other methods.

There are two complications when reading the request body. The first is the possibility that the remote client will break the connection before it has sent all the data it has declared it is sending. For this reason you have to set a timeout during the read so that the handler does not hang indefinitely. The timeout API is discussed later in this chapter. The second is the existence of the HTTP/1.1 "chunked" data type, in which the data is transmitted in smallish chunks, each preceded by a byte count. Sending chunked content data is different from submitting the request body normally because there is no Content-length header in the request to tell you in advance how many bytes to expect. In general, modules should request a client read policy of REQUEST_CHUNKED_ERROR to force the browser to use non-chunked (standard) data transfer mode.

You should set a hard timeout prior to making the first client data read by calling the ap_hard_timeout() function described later. To deal properly with chunked data, you will establish a "read policy" chosen from among the following constants defined in include/httpd.conf:

REQUEST_NO_BODY

This is the simplest policy of all. It causes the request body API functions to return a 413 "HTTP request entity too large" error if the submitted request has any content data at all!

REQUEST_CHUNKED_ERROR

This is the next simplest policy. The request body API functions will allow the browser to submit ordinary content data but will reject attempts to send chunked data, with a 411 "HTTP length required" error. If the client follows the recommendations of the HTTP/1.1 protocol, it will resubmit the content using the nonchunked method. This read policy is the recommended method and guarantees that you will always get a Content-length header if there is a request body.

REQUEST_CHUNKED_DECHUNK

If this read policy is specified, Apache will accept both chunked and nonchunked data. If the request is chunked, it will buffer it and return to you the number of bytes you request in ap_get_client_block() (described later in this section).

REQUEST_CHUNKED_PASS

Under this read policy, Apache will accept both chunked and nonchunked data. If the data is chunked, no attempt is made to buffer it. Your calls to ap_get_client_block() must be prepared to receive a buffer-load of data exactly as long as the chunk length.

The Apache request body API consists of three functions:

int ap_setup_client_block (request_rec *r, int read_policy)

Before reading any data from the client, you must call ap_setup_client_block(). This tells Apache you are ready to read from the client and sets up its internal state (in the request record) to keep track of where you are in the read process. The function has two arguments: the current request_rec and the read policy selected from the constants in the preceding list. This function will return OK if Apache was successful in setting up for the read or an HTTP status code if an error was encountered. If an error result code is returned, you should use it as the status value that is returned from your handler.
The error codes that can be generated depend on the read policy you specify. If REQUEST_CHUNKED_ERROR was specified, then this call will return HTTP_ LENGTH_REQUIRED if the client tries to submit a chunked request body. If REQUEST_NO_BODY was specified, then this function will return HTTP_REQUEST_ ENTITY_TOO_LARGE if any request body is present. HTTP_BAD_REQUEST will be returned for a variety of client errors, such as sending a non-numeric Content-length field.
A side effect of ap_setup_client_block() is to convert the value of Content-length into an integer and store it in the remaining field of the request_rec.

int ap_should_client_block (request_rec *r)

Just before beginning to read from the client, you must call ap_should_client_block(). It will return a Boolean value indicating whether you should go ahead and read, or abort. Despite its name, this function is more useful for the information it provides to the browser than for the status information it returns to you. When the HTTP/1.1 protocol is in use, ap_should_client_block() transmits a 100 "Continue" message to the waiting browser, telling it that the time has come to transmit its content.

long ap_get_client_block (request_rec *r, char *buffer, int bufsiz)

This is the function that actually reads data from the client. You provide the current request record, a buffer of the appropriate size, and a count of the maximum number of bytes you wish to receive. ap_get_client_block() will read up to the specified number of bytes and return the count received. If you are handling nonchunked data, do not try to read more than the number of bytes declared in Content-length because this may cause the attempted read to block indefinitely.

In the code example shown in Example 10-4, we begin by calling ap_setup_client_block() to convert the Content-length header to an integer and store the value in the remaining field of the request_rec. We then use the value of remaining to allocate a buffer, rbuf, large enough to hold the entire contents. We next set up a hard timeout and then enter a loop in which we call ap_get_client_block() repeatedly, transferring the read data to the buffer piece by piece. The length of each piece we read is at most the value of HUGE_STRING_LEN, a constant defined in httpd.h. The timeout alarm is reset with ap_reset_timeout() after each successful read. When the data has been read completely, we call ap_kill_timeout() to turn off the timeout alarm, and return.

Notice that we call ap_setup_client_block() with a read policy of REQUEST_CHUNKED_ERROR. This makes the program logic simpler because it forces the client to use the nonchunked transfer method.

Example 10-4. Chunked Client Input

static int util_read(request_rec *r, const char **rbuf)
{
   int rc;

    if ((rc = ap_setup_client_block(r, REQUEST_CHUNKED_ERROR)) != OK) {
       return rc;
   }

    if (ap_should_client_block(r)) {
       char argsbuffer[HUGE_STRING_LEN];
      int rsize, len_read, rpos=0;
      long length = r->remaining;
      *rbuf = ap_pcalloc(r->pool, length + 1);

       ap_hard_timeout("util_read", r);

       while ((len_read =
               ap_get_client_block(r, argsbuffer, sizeof(argsbuffer))) > 0) { 
          ap_reset_timeout(r);
          if ((rpos + len_read) > length) {
              rsize = length - rpos;
          }
          else {
              rsize = len_read;
          }
          memcpy((char*)*rbuf + rpos, argsbuffer, rsize);
          rpos += rsize;
      }

       ap_kill_timeout(r);
    }
   return rc;
}

No mainstream web client currently uses the chunked data transfer method, so we have not yet had the occasion to write code to handle it. Should chunked data transfer become more widely adopted, check the www.modperl.com site for code examples illustrating this aspect of the API.

Because POST requests are used almost exclusively to submit the contents of fill-out forms, you'd think that there would be an API specially designed for recovering and parsing this information. Unfortunately there isn't, so you'll have to roll your own.³ Example 10-5 defines a function called read_post() that shows you the basic way to do this. You pass read_post() the request record and an empty table pointer. The function reads in the request body, parses the URL-encoded form data, and fills the table up with the recovered key/value pairs. It returns one of the error codes OK or DECLINED, although this is just for our convenience and not something required by the Apache API.

The example begins by defining a constant named DEFAULT_ENCTYPE that contains the standard MIME type for POSTed fill-out forms. Next we define the read_post() function. read_post() examines the request record's method_number field to ensure that this is a POST request. If not, it just returns OK without modifying the passed table. read_post() then examines the incoming request's Content-type field, using ap_table_get() to fetch the information from the request record's headers_in field. If the content type doesn't match the expected POST type, the function exits with a DECLINED error code.

We now read the data into a buffer using the util_read() function from Example 10-4, passing on the result code to the caller in case of error.

The last task is to parse out the key=value pairs from the query string. We begin by clearing the passed table, deleting its previous contents, if any. If a NULL pointer was passed in, we allocate a new one with ap_make_table(). We then enter a loop in which we split the buffer into segments delimited by the & character, using the handy ap_getword() function for this purpose (described in the next chapter). We then call ap_getword() again to split each segment at the = character into key/value pairs. We pass both the key and value through ap_unescape_url() to remove the URL escapes, and enter them into the table with ap_table_merge(). We use ap_table_merge() rather than ap_table_add() here in order to spare the caller the inconvenience of using ap_table_do() to recover the multiple values. The disadvantage of this choice is that values that contain commas will not be correctly handled, since ap_table_merge() uses commas to separate multiple values.

Example 10-5. Reading POSTed Form Data

#define DEFAULT_ENCTYPE "application/x-www-form-urlencoded"

static int read_post(request_rec *r, table **tab)
{
   const char *data;
   const char *key, *val, *type;
   int rc = OK;

    if(r->method_number != M_POST) {
       return rc;
   }

    type = ap_table_get(r->headers_in, "Content-Type");
    if(strcasecmp(type, DEFAULT_ENCTYPE) != 0) {
      return DECLINED;
   }

    if((rc = util_read(r, &data)) != OK) {
       return rc;
   }

    if(*tab) {
        ap_clear_table(*tab);
   }
   else {
       *tab = ap_make_table(r->pool, 8);
   }

    while(*data && (val = ap_getword(r->pool, &data, '&'))) { 
      key = ap_getword(r->pool, &val, '=');

       ap_unescape_url((char*)key);
       ap_unescape_url((char*)val);

       ap_table_merge(*tab, key, val);
    }

    return OK;
}

Show Contents Previous Page Next Page