With the tree-based strategy, the parser keeps
the data to itself until the very end, when it presents a complete
model of the document to your program. Instead of a pipeline,
it's like a camera that takes a picture and
transmits the replica to you. The model is usually in a much more
convenient state than raw XML. For example, nested elements may be
represented in native Perl structures like lists or hashes, as we saw
in an earlier example. Even more useful are trees of blessed objects
with methods that help navigate the structure from one place to
another. The whole point to this strategy is that your program can
pull out any data it needs, in any order.
Why would you prefer one over the other? Each has strong and weak
points. Event streams are fast and often have a much slimmer memory
footprint, but at the expense of greater code complexity and
impermanent data. Tree building, on the other hand, lets the data
stick around for as long as you need it, and your code is usually
simple because you don't need special tricks to do
things like backwards searching. However, trees wither when it comes
to economical use of processor time and memory.
All of this is relative, of course. Small documents
don't cause much hardship to a typical computer,
especially since CPU cycles and megabytes are getting cheaper every
day. Maybe the convenience of a persistent data structure will
outweigh any drawbacks. On the other hand, when working with
Godzilla-sized documents like books, or huge numbers of documents all
at once, you'll definitely notice the crunch. Then
the agility of event stream processors will start to look better.
It's impossible to give you any hard-and-fast rules,
so we'll leave the decision up to you.
An interesting thing to note about the stream-based and tree-based
strategies is that one is the basis for the other.
That's right, an event stream drives the process of
building a tree data structure. Thus, most low-level parsers are
event streams because you can always write a tree building layer on
top. This is how XML::Parser and most other
parsers work.
In a related, more recent, and very cool development, XML event
streams can also turn any kind of document into some form of XML by
writing stream-based parsers that generate XML events from whatever
data structures lurk in that document type.