3.6. XML::XPath
We've seen
examples of parsers that dutifully deliver the entire document to
you. Often, though, you don't need the whole thing.
When you query a database, you're usually looking
for only a single record. When you crack open a telephone book,
you're not going to sit down and read the whole
thing. There is obviously a need for some mechanism of extracting a
specific piece of information from a vast document. Look no further
than XPath.
XPath is a recommendation from the folks who brought you
XML.[18] It's a grammar for writing expressions
that pinpoint specific pieces of documents. Think of it as an
addressing scheme. Although we'll save the
nitty-gritty of XPath wrangling for Chapter 8, "Beyond Trees: XPath, XSLT, and More", we
can tantalize you by revealing that it works much like a mix of
regular expressions with Unix-style file paths. Not surprisingly,
this makes it an attractive feature to add to parsers.
Matt Sergeant's XML::XPath module
is a solid implementation, built on the foundation of
XML::Parser. Given an XPath expression, it returns
a list of all document parts that match the description.
It's an incredibly simple way to perform some
powerful search and retrieval work.
For instance, suppose we have an address book encoded in XML in this
basic form:
<contacts>
<entry>
<name>Bob Snob</name>
<street>123 Platypus Lane</street>
<city>Burgopolis</city>
<state>FL</state>
<zip>12345</zip>
</entry>
<!--More entries go here-->
</contacts>
Suppose you want to extract all the zip codes from the file and
compile them into a list. Example 3-7 shows how you
could do it with XPath.
Example 3-7. Zip code extractor
use XML::XPath;
my $file = 'customers.xml';
my $xp = XML::XPath->new(filename=>$file);
# An XML::XPath nodeset is an object which contains the result of
# smacking an XML document with an XPath expression; we'll do just
# this, and then query the nodeset to see what we get.
my $nodeset = $xp->find('//zip');
my @zipcodes; # Where we'll put our results
if (my @nodelist = $nodeset->get_nodelist) {
# We found some zip elements! Each node is an object of the class
# XML::XPath::Node::Element, so I'll use that class's 'string_value'
# method to extract its pertinent text, and throw the result for all
# the nodes into our array.
@zipcodes = map($_->string_value, @nodelist);
# Now sort and prepare for output
@zipcodes = sort(@zipcodes);
local $" = "\n";
print "I found these zipcodes:\n@zipcodes\n";
} else {
print "The file $file didn't have any 'zip' elements in it!\n";
}
Run the program on a document with three entries and
we'll get something like this:
I found these zipcodes:
03642
12333
82649
This module also shows an example of tree-based parsing, by the way,
as its parser loads the whole document into an object tree of its own
design and then allows the user to selectively interact with parts of
it via XPath expressions. This example is just a sample of what you
can do with advanced tree processing modules. You'll
see more of these modules in Chapter 8, "Beyond Trees: XPath, XSLT, and More".
XML::LibXML's element objects
support a findnodes( ) method that works much like
XML::XPath's, using the invoking
Element object as the current context and
returning a list of objects that match the query.
We'll play with this functionality later in
Chapter 10, "Coding Strategies".
 |  |  | 3.5. XML::LibXML |  | 3.7. Document Validation |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|