To test this theory, we ran the program on a 3 MB document, first
without and then with the line shown above. Without flushing, the
program's heap space grew to over 30 MB.
It's staggering to see how much memory an
object-oriented tree processor needs -- in this case ten times the
size of the file. But with flushing enabled, the program hovered
around only a few MB of memory usage, a savings of about 90 percent.
In both cases, the entire tree is eventually built, so the total
processing time is about the same. To save CPU cycles as well as
memory, we need to use multiple roots mode.
Multiple roots mode works by specifying before parsing the roots of
the twigs that you want built. You will save significant time and
memory if the twigs are much smaller than the document as a whole. In
our chunk mode example, we probably can't do much to
speed up the process, since the sum of
<chapter> elements is about the same as the
size of the document. So let's focus on an example
that fits the profile.
Example 8-12. A many-twigged program
use XML::Twig;
my $twig = new XML::Twig( TwigRoots => { 'chapter/title' => \&output_title });
$twig->parsefile( shift @ARGV );
sub output_title {
my( $tree, $elem ) = @_;
print $elem->text, "\n";
}
The key line here is the one with the keyword
TwigRoots. It's set to a hash of
handlers and works very similarly to TwigHandlers
that we saw earlier. The difference is that instead of building the
whole document tree, the program builds only trees whose roots are
<title> elements. This is a small fraction
of the whole document, so we can expect time and memory savings to be
high.
How high? Running the program on the same test data, we saw memory
usage barely reach 2 MB, and the total processing time was 13
seconds. Compare that to 30 MB memory usage (the size required to
build the whole tree) and a full minute to grind out the titles. This
conservation of resources is significant for both memory and CPU
time.