17.2 The LWP ModulesThe LWP modules provide the core of functionality for web programming in Perl. It contains the foundations for networking applications, protocol implementations, media type definitions, and debugging ability. The modules LWP::Simple and LWP::UserAgent define client applications that implement network connections, send requests, and receive response data from servers. LWP::RobotUA is another client application that is used to build automated web searchers following a specified set of guidelines. LWP::UserAgent is the primary module used in applications built with LWP. With it, you can build your own robust web client. It is also the base class for the Simple and RobotUA modules. These two modules provide a specialized set of functions for creating clients. Additional LWP modules provide the building blocks required for web communications, but you often don't need to use them directly in your applications. LWP::Protocol implements the actual socket connections with the appropriate protocol. The most common protocol is HTTP, but mail protocols (like SMTP), FTP for file transfers, and others can be used across networks. LWP::MediaTypes implements the MIME definitions for media type identification and mapping to file extensions. The LWP::Debug module provides functions to help you debug your LWP applications. The following sections describe the RobotUA, Simple, and UserAgent modules of LWP. 17.2.1 LWP::RobotUA sectionsThe Robot User Agent (LWP::RobotUA) is a subclass of LWP::UserAgent, and is used to create robot client applications. A robot application requests resources in an automated fashion. Robots perform such activities as searching, mirroring, and surveying. Some robots collect statistics, while others wander the Web and summarize their findings for a search engine. The LWP::RobotUA module defines methods to help program robot applications and observes the Robot Exclusion Standards, which web server administrators can define on their web site to keep robots away from certain (or all) areas of the site. The constructor for an LWP::RobotUA object looks like this: The first parameter,$rob = LWP::RobotUA->new( agent_name , email , [$ rules ]);
agent_name
, is the user agent identifier
that is used for the value of the User-Agent header
in the request. The second parameter is the email address of the person using the robot, and the
optional third parameter is a reference to a WWW::RobotRules object, which is used to store
the robot rules for a server.
If you omit the third
parameter, the LWP::RobotUA module requests the
robots.txt
file from every server it contacts,
and then generates its own WWW::RobotRules object.
Since LWP::RobotUA is a subclass of LWP::UserAgent, the LWP::UserAgent methods are used to perform the basic client activities. The following methods are defined by LWP::RobotUA for robot-related functionality: 17.2.2 LWP::SimpleLWP::Simple provides an easy-to-use interface for creating a web client, although it is only capable of performing basic retrieving functions. An object constructor is not used for this class; it defines functions to retrieve information from a specified URL and interpret the status codes from the requests. This module isn't named Simple for nothing. The following lines show how to use it to get a web page and save it to a file: The retrieving functionsuse LWP::Simple; $homepage = 'oreilly_com.html'; $status = getstore('http://www.oreilly.com/', $homepage); print("hooray") if is_success($status);
get
and
head
return the URL's contents and header
contents respectively. The other retrieving functions return the HTTP status code
of the request. The status codes are returned as the constants from the
HTTP::Status module, which is also where the
is_success
and
is_failure
methods are obtained.
See
Section 17.3.4, "HTTP::Status
" later in this chapter for a listing of the response codes.
The user-agent identifier produced by LWP::Simple is The following list describes the functions exported by LWP::Simple: 17.2.3 LWP::UserAgentRequests over the network are performed with LWP::UserAgent objects. To create an LWP::UserAgent object, use: You give the object a request, which it uses to contact the server, and the information you requested is returned. The most often used method in this module is$ua = new LWP::UserAgent;
request
, which contacts a server
and returns the
result of your query. Other methods in this module change the way
request
behaves.
You can change the timeout value, customize the value of the User-Agent header, or use a
proxy server.
The following methods are supplied by LWP::UserAgent:
|
|