22.3.3. Discussion
An XML processor that uses SAX has three
parts: the XML parser that generates SAX events, the handler that
reacts to them, and the stub that connects the two. The XML parser
can be XML::Parser, XML::LibXML, or the pure Perl XML::SAX::PurePerl
that comes with XML::SAX. The XML::SAX::ParserFactory module selects
a parser for you and connects it to your handler. Your handler takes
the form of a class that inherits from XML::SAX::Base. The stub is
the program shown in the Solution.
The
XML::SAX::Base module provides stubs for the different methods that
the XML parser calls on your handler. Those methods are listed in
Table 22-2, and are the methods defined by the SAX1
and SAX2 standards at http://www.saxproject.org/. The Perl
implementation uses more Perl-ish data structures and is described in
the XML::SAX::Intro manpage.
Table 22-2. XML::SAX::Base methods
start_document
|
end_document
|
characters
|
start_element
|
end_element
|
processing_instruction
|
ignorable_whitespace
|
set_document_locator
|
skipped_entity
|
start_prefix_mapping
|
end_prefix_mapping
|
comment
|
start_cdata
|
end_cdata
|
entity_reference
|
notation_decl
|
unparsed_entity_decl
|
element_decl
|
attlist_decl
|
doctype_decl
|
xml_decl
|
entity_decl
|
attribute_decl
|
internal_entity_decl
|
start_dtd
|
end_dtd
|
external_entity_decl
|
resolve_entity
|
start_entity
|
end_entity
|
warning
|
error
|
fatal_error
|
The two data structures you need most often are those representing
elements and attributes. The $data parameter to
start_element and end_element
is a hash reference. The keys of the hash are given in Table 22-3.
Table 22-3. An XML::SAX element hash
Key
|
Meaning
|
Prefix
|
XML namespace prefix (e.g., email:)
|
LocalName
|
Attribute name without prefix (e.g., to)
|
Name
|
Fully qualified attribute name (e.g., email:to)
|
Attributes
|
Hash of attributes of the element
|
NamespaceURI
|
URI of the XML namespace for this attribute
|
An attribute hash has a key for each attribute. The key is structured
as
"{namespaceURI}attrname".
For example, if the current namespace URI is http://example.com/dtds/mailspec/ and the
attribute is msgid, the key in the attribute hash
is:
{http://example.com/dtds/mailspec/}msgid
The attribute value is a hash; its keys are given in Table 22-4.
Table 22-4. An XML::SAX attribute hash
Key
|
Meaning
|
Prefix
|
XML namespace prefix (e.g., email:)
|
LocalName
|
Element name without prefix (e.g., to)
|
Name
|
Fully qualified element name (e.g., email:to)
|
Value
|
Value of the attribute
|
NamespaceURI
|
URI of the XML namespace for this element
|
Example 22-4 shows how to list the book titles using
SAX events. It's more complex than the DOM solution because with SAX
we must keep track of where we are in the XML document.
Example 22-4. sax-titledumper
# in TitleDumper.pm
# TitleDumper.pm -- SAX handler to display titles in books file
package TitleDumper;
use base qw(XML::SAX::Base);
my $in_title = 0;
# if we're entering a title, increase $in_title
sub start_element {
my ($self, $data) = @_;
if ($data->{Name} eq 'title') {
$in_title++;
}
}
# if we're leaving a title, decrease $in_title and print a newline
sub end_element {
my ($self, $data) = @_;
if ($data->{Name} eq 'title') {
$in_title--;
print "\n";
}
}
# if we're in a title, print any text we get
sub characters {
my ($self, $data) = @_;
if ($in_title) {
print $data->{Data};
}
}
1;
The XML::SAX::Intro manpage provides a gentle introduction to
XML::SAX parsing.