Programming with LWP Classes (Perl & LWP)

3.2. Programming with LWP Classes

The first step in writing a program that uses the LWP classes is to create and initialize the browser object, which can be used throughout the rest of the program. You need a browser object to perform HTTP requests, and although you could use several browser objects per program, I've never run into a reason to use more than one.

The browser object can use a proxy (a server that fetches web pages for you, such as a firewall, or a web cache such as Squid). It's good form to check the environment for proxy settings by calling env_proxy():

use LWP::UserAgent;
my $browser = LWP::UserAgent->new( );
$browser->env_proxy( ); # if we're behind a firewall

That's all the initialization that most user agents will ever need. Once you've done that, you usually won't do anything with it for the rest of the program, aside from calling its get( ), head( ), or post( ) methods, to get what's at a URL, or to perform HTTP HEAD or POST requests on it. For example:

$url = 'http://www.guardian.co.uk/';
my $response = $browser->get($url);

Then you call methods on the response to check the status, extract the content, and so on. For example, this code checks to make sure we successfully fetched an HTML document that isn't worryingly short, then prints a message depending on whether the words "Madonna" or "Arkansas" appear in the content:

die "Hmm, error \"", $response->status_line( ),
  "\" when getting $url"  unless $response->is_success( );
my $content_type = $response->content_type( );
die "Hm, unexpected content type $content_type from $url"
   unless $content_type eq 'text/html';
my $content = $response->content( );
die "Odd, the content from $url is awfully short!"
   if length($content) < 3000;
if($content =~ m/Madonna|Arkansas/i) {
   print "<!-- The news today is IMPORTANT -->\n",
         $content;
} else {
   print "$url has no news of ANY CONCEIVABLE IMPORTANCE!\n";
}

As you see, the response object contains all the data from the web server's response (or an error message about how that server wasn't reachable!), and we use method calls to get at the data. There are accessors for the different parts of the response (e.g., the status line) and convenience functions to tell us whether the response was successful (is_success( )).

And that's a working and complete LWP program!