20.1.3. Discussion
The right library makes life easier, and the LWP modules are the
right ones for this task. As you can see from the Solution, LWP makes
this task a trivial one.
The get function from LWP::Simple returns
undef on error, so check for errors this way:
use LWP::Simple;
unless (defined ($content = get $URL)) {
die "could not get $URL\n";
}
When called that way, however, you can't determine the cause of the
error. For this and other elaborate processing, you'll have to go
beyond LWP::Simple.
Although the simple forms listed aren't legitimate URLs (their format
is not in the URI specification), Netscape tries to guess the URLs
they stand for. Because Netscape does it, most other browsers do,
too.
Example 20-1. titlebytes
#!/usr/bin/perl -w
# titlebytes - find the title and size of documents
use strict;
use LWP::UserAgent;
use HTTP::Response;
use URI::Heuristic;
my $raw_url = shift or die "usage: $0 url\n";
my $url = URI::Heuristic::uf_urlstr($raw_url);
$| = 1; # to flush next line
printf "%s =>\n\t", $url;
# bogus user agent
my $ua = LWP::UserAgent->new( );
$ua->agent("Schmozilla/v9.14 Platinum"); # give it time, it'll get there
# bogus referrer to perplex the log analyzers
my $response = $ua->get($url, Referer => "http://wizard.yellowbrick.oz");
if ($response->is_error( )) {
printf " %s\n", $response->status_line;
} else {
my $content = $response->content( );
my $bytes = length $content;
my $count = ($content =~ tr/\n/\n/);
printf "%s (%d lines, %d bytes)\n",
$response->title( ) || "(no title)", $count, $bytes;
}
When run, the program produces output like this:
% titlebytes http://www.tpj.com/
http://www.tpj.com/ =>
The Perl Journal (109 lines, 4530 bytes)
Yes, "referer" is not how "referrer" should be spelled. The standards
people got it wrong when they misspelled HTTP_REFERER. Please use
double r's when referring to things in English.
The first argument to the get method is the URL,
and subsequent pairs of arguments are headers and their values.