HTTP::Response Objects (Perl & LWP)

3.5. HTTP::Response Objects

You have to manually create most objects your programs work with by calling an explicit constructor, with the syntax ClassName->new( ). HTTP::Response objects are a notable exception. You never need to call HTTP::Response->new( ) to make them; instead, you just get them back as the result of a request made with one of the request methods (get( ), post( ), and head( )).

That is, when writing web clients, you never need to create a response yourself. Instead, a user agent creates it for you, to encapsulate the results of a request it made. You do, however, interrogate a response object's attributes. For example, the code( ) method returns the HTTP status code:

print "HTTP status: ", $response->code( ), "\n";
HTTP status: 404

HTTP::Response objects also have convenience methods. For example, is_success( ) returns a true value if the response had a successful HTTP status code, or false if it didn't (e.g., 404, 403, 500, etc.). Always check your responses, like so:

die "Couldn't get the document"
  unless $response->is_success( );

You might prefer something a bit more verbose, like this:

# Given $response and $url ...
die "Error getting $url\n", $response->status_line
  unless $response->is_success( );

3.5.1. Status Line

The status_line( ) method returns the entire HTTP status line:

$sl = $response->status_line( );

This includes both the numeric code and the explanation. For example:

$resp = $browser->get("http://www.cpan.org/nonesuch");
print $response->status_line( );
404 Not Found

To get only the status code, use the code( ) method:

$code = $response->code( );

To access only the explanatory message, use the message( ) method:

$msg = $response->message( );

For example:

$resp = $browser->get("http://www.cpan.org/nonesuch");
print $response->code(), " (that means ", $response->message( ), " )\n";
404 (that means Not Found)

Four methods test for types of status codes in the response: is_error( ), is_success( ), is_redirect( ), and is_info( ). They return true if the status code corresponds to an error, a successful fetch, a redirection, or informational (e.g., "102 Processing").

$boolean = $response->is_error( );
$boolean = $response->is_success( );
$boolean = $response->is_redirect( );
$boolean = $response->is_info( );

Exactly what codes count as what sort of status, is explained in greater detail in Appendix B, "HTTP Status Codes".

3.5.2. Content

Most responses contain content after their headers. This content is accessible with the content( ) method:

$the_file_data = $response->content( );

In some cases, it's easier (and more efficient) to get a scalar reference to the content, instead of the value of the content itself. For that, use the content_ref( ) method:

$data_ref = $response->content_ref( );

For example in Chapter 7, "HTML Processing with Tokens", we use a class called HTML::TokeParser that parses HTML starting with a reference to a big block of HTML source. We could use that module to parse the HTML in an HTTP::Response object by using do{ my $x = $response->content( ); \$x}, but we could avoid the unnecessary copying by just using $response->content_ref( ).

3.5.3. Headers

To fetch the value of an HTTP header in the response, use the header( ) method:

$value = $response->header(header_name);

For example, if you know there will be useful data in a header called Description, access it as $response->header('Description'). The header( ) method returns undef if there is no such header in this response.

HTTP::Response provides some methods for accessing the most commonly used header fields:

$type = $response->content_type( );

The Content-Type header contains the MIME type of the body. This is "text/html" for HTML files, "image/jpeg" for JPEG files, and so on. Appendix C, "Common MIME Types" contains a list of common MIME types.

$length = $response->content_length( );

The Content-Length header contains the size of the body (in bytes) sent from the browser but is not always present. If you need the real length of the response, use length($response->content).

$lm = $response->last_modified( );

The Last-Modified header contains a timestamp indicating when the content was last modified, but it is sometimes not present.

$encoding = response->content_encoding( );

The Content-Encoding header contains the name of the character set this document is declared as using. The most common value is iso-8859-1 meaning Latin-1. An increasingly common runner-up is utf-8, meaning Unicode expressed in the UTF-8 encoding. Less-common encodings are listed in Appendix E, "Common Content Encodings". But be warned: this header is occasionally inaccurate, in cases where content is clearly in one encoding, but the document fails to declare it as such. For example, a document might be in Chinese in the big5 encoding but might erroneously report itself as being in iso-8859-1.

This brings us to a regrettably even less-used header:

$language = $response->content_language( );

Rarely present, the Content-Language header contains the language tag(s) for the document's content. Appendix D, "Language Tags" lists common language tags.

If you want to get all the headers as one string, call $response->headers_as_string. This is useful for debugging, as in:

print "Weird response!!\n",
  $response->headers_as_string, "\n\n"
unless $response->content_type( );

3.5.4. Expiration Times

Most servers send a Date header as well as an Expires or Last-Modified header with their responses. Four methods on HTTP::Response objects use these headers to calculate the age of the document and various caching statistics.

The current_age( ) method returns the number of seconds since the server sent the document:

$age = $response->current_age( );

For example:

$age = $response->current_age( );
$days  = int($age/86400);       $age -= $days * 86400;
$hours = int($age/3600);        $age -= $hours * 3600;
$mins  = int($age/60);          $age -= $minutes * 60;
$secs  = $age;
print "The document is $days days, $hours hours, $mins minutes, and $secs
seconds old.\n";
The document is 0 days, 0 hours, 5 minutes, and 33
seconds old.

The freshness_lifetime( ) method returns the number of seconds until the document expires:

$lifetime = $response->freshness_lifetime( );

For example:

$time = $response->freshness_lifetime( );
$days  = int($time/86400);       $time -= $days * 86400;
$hours = int($time/3600);        $time -= $hours * 3600;
$mins  = int($time/60);          $time -= $mins * 60;
$secs  = int($time);
print "The document expires in $days days, $hours hours, $mins minutes, and
$secs seconds.\n";
The document expires in 0 days, 23 hours, 6 minutes, and 15 seconds.

The is_fresh( ) method returns true if the document has not expired yet:

$boolean = $response->is_fresh( );

If the document is not fresh, your program should reissue the request to the server. This is an issue only if your program runs for a long time and you keep responses for later interrogation.

The fresh_until( ) entry returns the time when the document expires:

$expires = $response->fresh_until( );

For example:

$expires = $response->fresh_until( );
print "This document is good until ", scalar(localtime($expires)), "\n";
This document is good until Tue Feb 26 07:36:08 2004

3.5.5. Base for Relative URLs

An HTML document can have relative URLs in it. For example:

<img src="my_face.gif">

This generally refers to the my_face.gif that's located in the same directory as the HTML page. Turning these relative URLs into absolute URLs that can be requested via LWP is covered in the next chapter. To do that, you must know the URL of the current page.

The base( ) method returns the URL of the document in the response.

$url = $response->base( );

This base URL is normally the URL you requested but can sometimes differ: if there was a redirection (which LWP normally follows through on), the URL of the final response isn't the same as the requested URL. Moreover, the Base, Content-Base, and Content-Location headers in a response specify the address against which you resolve relative URLs. And finally, if the response content is an HTML document and has a <base href="..."> tag in its head, that definitively sets the base URL.

3.5.6. Debugging

When an error occurs (as indicated by the is_error( ) method), error_as_HTML( ) returns an error page in HTML:

$error_page = $response->error_as_HTML( );
print "The server said:\n<blockquote>$error_page</blockquote>\n";

Because a user agent can follow redirections and automatically answer authentication challenges, the request you gave to the user agent object might not be the request represented by your object. That is, you could have said to get one URL, but that could have directed to another, which could have redirected to another, producing not one response but a chain of responses. For the sake of simplicity, you get back only the one $response object, which is the last in the chain. But if you need to, you can work your way back, using the previous( ) method:

$previous_response = $response->previous( );

The previous( ) method returns undef when there is no previous method (i.e., on the response to the request you gave the user agent, at the head of the chain). Moreover, each response stores the HTTP::Request object that LWP used for making the request, and you can access it with the $response->request( ). HTTP::Request objects support most of the same methods as HTTP::Response objects, notably $request->as_string, which is useful in debugging.

From each response, you can get the corresponding request and recreate the HTTP dialog. For example:

$last = $response;
while ($response) {
  print $response->code( ), " after ";
    # Or you could print even dump the whole
    #   thing, with $response->as_string( )
 
  $last = $response;
  $response = $response->previous( );
}
print "the original request, which was:\n",
  $last->request->as_string;
 
200 after 401 after 301 after the original request, which was:
GET http://some.crazy.redirector.int/thing.html
User-Agent: libwww-perl/5.5394