home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeJava and XSLTSearch this book

20.5. The URI Module

The URI module contains functions and modules to specify and convert URIs. (URLs are a type of URI.) In addition to the URL module itself, there are also: URI::URL, URI::Escape, and URI::Heuristic. Of primary importance to many LWP applications is the URI::URL class, which creates the objects used by LWP::UserAgent to determine protocols, server locations, and resource names.

The URI::Escape module replaces unsafe characters in URL strings with their appropriate escape sequences. URI::Heuristic provides convenience methods for creating proper URLs out of short strings and incomplete addresses.

20.5.1. URI

The URI module is a successor to URI::URL and was written by Gisle Aas. While not clearly stated in the LWP documentation, you should use the URI module whenever possible, since URI.pm has essentially deprecated URI::URL.

The URI module implements the URI class. Objects created from the URI class represent Uniform Resource Identifiers (URIs). With the URI module, you can identify the key parts of a URI: scheme, scheme-specific parts, and fragment identifiers, which may be referred to respectfully as authority, path, and query components. For example, as shown in the URI module documentation:

<scheme>:<scheme-specific-part>#<fragment>
<scheme>://<authority><path>?<query>#<fragment>
<path>?<query>#<fragment>

You can break down http://www.oreilly.com/somefile.html as:

scheme: http
authority: www.oreilly.com
path: /somefile.html

In the case of relative URIs, you can use the URI module to deal with only the query component of a URI. With the URI module, you can parse the above URI as follows:

#!/usr/local/bin/perl -w

use URI;

my $url = 'http://www.oreilly.com/somefile.html';
my $u1 = URI->new($url);

print "scheme: ", $u1->scheme, "\n";
print "authority: ", $u1->authority, "\n";
print "path: ", $u1->path, "\n";

20.5.1.1. URI methods

The following methods give you access to components of a URI. These methods will return a string, unless the URI component is invalid, in which case undef is returned. Bear in mind that an empty string ("") is not equivalent to an undefined value.

new

new($uri, [$scheme])

Constructor. $uri is given as an argument with the optional $scheme. new removes additional whitespace, double quotes, and arrows from the URL. $scheme is used only when $str is a relative URI; it is a simple string that denotes the scheme or an absolute URI object. $str will be treated like a generic URI if $scheme isn't defined.

new

URI::file->new($file, [$os])

Constructs a new file URI from a filename.

new_abs

URI::file->new_abs($file, [$os])

Constructs a new absolute file URI from a filename.

abs

abs($base_uri)

Returns an absolute URI reference. If $uri is already absolute, then a reference to $uri is returned. abs returns a new absolute URI that contains $uri and $base_uri if $uri is relative.

as_string

as_string

Returns a URI object as a plain string.

authority

authority([$auth])

Sets and gets the authority component of the $uri. This component will be escaped.

canonical

canonical

Returns a normalized version of the URI. This includes lowercasing the scheme and hostname components, as well as removing an explicit port specification (if it mtaches the default port). canonical will return the original $uri if $uri was already in the correct form.

clone

clone

Returns a copy of the URI.

cwd

URI::file->cwd

Returns the current working directory as a file URI.

default_port

default_port()

Returns the default port of the URI scheme that $uri belongs to. You cannot change the default port for a scheme.

eq

eq()

Compares two URIs.

fragment

fragment([$new_frag])

Returns the fragment identifier of a URI reference as an escaped string.

host

host([$new_host])

Sets and gets the unescaped hostname. To specify a different port:

$new_host = "hostname:port_number"
host_port

host_port([$new_host_port])

Sets and gets the host and port as a single unit. Hostname and port are colon-separated.

new_abs

new_abs($str, $base_uri)

Creates a new absolute URI object. $str represents the absolute URI, and $base_uri represents the relative URI.

opaque

opaque([$new_opaque_value])

Sets and gets the scheme-specific part of $uri.

path

path([$path])

Sets and gets the escaped path component of $uri. Returns empty string ("") if there is no path.

path

path([$new_path])

Gets and sets the same value as opaque, unless the URI supports the generic syntax for heirarchical namespaces. path returns the part of the URI between the hostname and the fragment.

path_query

path_query([$path_here])

Sets and gets the escapted path and query components.

path_segments

path_segments([$seg])

Sets and gets the path. In a scalar content, path_segments is equivalent to path. In a list contents, path_segments returns the unescaped path segments that make up the path.

port

port([$new_port])

Sets and gets the port, which is an integer. If $new_port is not defined, then the default port of the URI scheme will be returned.

query

query([$q])

Sets and gets the escaped query component of $ uri .

query_form

query_form([$key => $val])

Sets and gets query components that use the urlencoded format.

query_keywords

query_keywords([$keywords])

Sets and gets query components that use keywords separated by a +.

rel

rel($base_uri)

Returns a relative URI reference, if one exists. Otherwise, $uri is returned.

scheme

scheme([$some_scheme])

Sets and gets the scheme part of the URI. Such values include: data, file, ftp, gopher, http, https, ldap, mailto, news, nntp, pop, rlogin, rsync, snews, telnet, and ssh. In the case of relative URIs, scheme will return undef; otherwise, scheme will return the scheme in lowercase. With $some_scheme, scheme will set the scheme of the current URI. scheme will die if the scheme isn't supported, or if it contains non-US-ASCII characters.

userinfo

userinfo([$new_userinfo])

Sets and gets the escaped "userinfo" part of the authority component (of the URI). Often, the userinfo will appear as a username and password separated by a colon. Bear in mind that sending a password in the clear is a bad idea.

20.5.3. URI::URL

This module creates URL objects that store all the elements of a URL. These objects are used by the request method of LWP::UserAgent for server addresses, port numbers, filenames, protocol, and many other elements that can be loaded into a URL.

The new constructor is used to make a URI::URL object:

$url = URI::URL->new($url_string [, $base_url])

This method creates a new URI::URL object with the URL given as the first parameter. An optional base URL can be specified as the second parameter and is useful for generating an absolute URL from a relative URL.

The following methods are for the URI::URL class.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.