home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeJava and XSLTSearch this book

20.2. The LWP Modules

The LWP modules provide the core of functionality for web programming in Perl. It contains the foundations for networking applications, protocol implementations, media type definitions, and debugging ability.

The modules LWP::Simple and LWP::UserAgent define client applications that implement network connections, send requests, and receive response data from servers. LWP::RobotUA is another client application used to build automated web searchers following a specified set of guidelines.

LWP::UserAgent is the primary module used in applications built with LWP. With it, you can build your own robust web client. It is also the base class for the Simple and RobotUA modules. These two modules provide a specialized set of functions for creating clients.

Additional LWP modules provide the building blocks required for web communications, but you often don't need to use them directly in your applications. LWP::Protocol implements the actual socket connections with the appropriate protocol. The most common protocol is HTTP, but mail protocols (such as SMTP), FTP for file transfers, and others can be used across networks.

LWP::MediaTypes implements the MIME definitions for media type identification and mapping to file extensions. The LWP::Debug module provides functions to help you debug your LWP applications.

The following sections describe the RobotUA, Simple, and UserAgent modules of LWP.

20.2.1. LWP::RobotUA Sections

The Robot User Agent (LWP::RobotUA) is a subclass of LWP::UserAgent and is used to create robot client applications. A robot application requests resources in an automated fashion. Robots perform such activities as searching, mirroring, and surveying. Some robots collect statistics, while others wander the Web and summarize their findings for a search engine.

The LWP::RobotUA module defines methods to help program robot applications and observes the Robot Exclusion Standards, which web server administrators can define on their web site to keep robots away from certain (or all) areas of the site.

The constructor for an LWP::RobotUA object looks like this:

$rob = LWP::RobotUA->new(agent_name, email, [$rules]);

The first parameter, agent_name, is the user agent identifier used for the value of the User-Agent header in the request. The second parameter is the email address of the person using the robot, and the optional third parameter is a reference to a WWW::RobotRules object, which is used to store the robot rules for a server. If you omit the third parameter, the LWP::RobotUA module requests the robots.txt file from every server it contacts and generates its own WWW::RobotRules object.

Since LWP::RobotUA is a subclass of LWP::UserAgent, the LWP::UserAgent methods are used to perform the basic client activities. The following methods are defined by LWP::RobotUA for robot-related functionality.

20.2.2. LWP::Simple

LWP::Simple provides an easy-to-use interface for creating a web client, although it is only capable of performing basic retrieving functions. An object constructor is not used for this class; it defines functions for retrieving information from a specified URL and interpreting the status codes from the requests.

This module isn't named Simple for nothing. The following shows how to use it to get a web page and save it to a file:

use LWP::Simple;

$homepage = 'oreilly_com.html';
$status = getstore('http:www.oreilly.com/', $homepage);
print("hooray") if is_success($status);

The retrieving functions get and head return the URL's contents and header contents, respectively. The other retrieving functions return the HTTP status code of the request. The status codes are returned as the constants from the HTTP::Status module, which is also where the is_success and is_failure methods are obtained. See Section 20.3.4, "HTTP::Status" for a listing of the response codes.

The user agent identifier produced by LWP::Simple is LWP::Simple/n.nn, in which n.nn is the version number of LWP being used.

The following are the functions exported by LWP::Simple.

get

get (url)

Returns the contents of the specified url. Upon failure, get returns undef. Other than returning undef, there is no way of accessing the HTTP status code or headers returned by the server.

20.2.3. LWP::UserAgent

Requests over the network are performed with LWP::UserAgent objects. To create an LWP::UserAgent object, use:

$ua = LWP::UserAgent->new( );

You give the object a request, which it uses to contact the server, and the information you requested is returned. The most often used method in this module is request, which contacts a server and returns the result of your query. Other methods in this module change the way request behaves. You can change the timeout value, customize the value of the User-Agent header, or use a proxy server.

The following methods are supplied by LWP::UserAgent.

put

$ua->put($url, [Header => Value])
$ua -- >gt;put($url, Header ==>gt; Value,...)

Shortcut for $ua->request(HTTP::Request::Common::PUT( $url, Header => Value,...)).

request

$ua->request($request, [file | $sub, size])

Performs a request for the resource specified by $request, which is an HTTP::Request object. Returns the information received from the server as an HTTP::Response object. Normally, doing a $ua->request($request) is enough. You can also specify a subroutine to process the data as it comes in or provide a filename in which to store the entity body of the response. The arguments are:

$request
An HTTP::Request object. The object must contain the method and URL of the site to be queried. This object must exist before request is called.

file
Name of the file in which to store the response's entity body. When this option is used on request, the entity body of the returned response object will be empty.

$sub
A reference to a subroutine that will process the data of the response. If you use the optional third argument, size, the subroutine will be called any time that number of bytes is received as response data. The subroutine should expect each chunk of the entity body data as a scalar in the first argument, an HTTP::Response object as the second argument, and an LWP::Protocol object as the third argument.

size
Optional argument specifying the number of bytes of the entity body received before the sub callback is called to process response data.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.