XML::Grove (Perl and XML)

6.6. XML::Grove

The last object model we'll examine before jumping into standards-based solutions is Ken MacLeod's XML::Grove. Like XML::SimpleObject, it takes the XML::Parser output in tree mode and changes it into an object hierarchy. The difference is that each node type is represented by a different class. Therefore, an element would be mapped to XML::Grove::Element, a processing instruction to XML::Grove::PI, and so on. Text nodes are still scalar values.

Another feature of this module is that the declarations in the internal subset are captured in lists accessible through the XML::Grove object. Every entity or notation declaration is available for your perusal. For example, the following program counts the distribution of elements and other nodes, and then prints a list of node types and their frequency.

First, we initialize the parser with the style "grove" (to tell XML::Parser that it needs to use XML::Parser::Grove to process its output):

use XML::Parser;
use XML::Parser::Grove;
use XML::Grove;

my $parser = XML::Parser->new( Style => 'grove', NoExpand => '1' );
my $grove = $parser->parsefile( shift @ARGV );

Next, we access the contents of the grove by calling the contents( ) method. This method returns a list including the root element and any comments or PIs outside of it. A subroutine called tabulate( ) counts nodes and descends recursively through the tree. Finally, the results are printed:

# tabulate elements and other nodes
my %dist;
foreach( @{$grove->contents} ) {
  &tabulate( $_, \%dist );
}
print "\nNODES:\n\n";
foreach( sort keys %dist ) {
  print "$_: " . $dist{$_} . "\n";
}

Here is the subroutine that handles each node in the tree. Since each node is a different class, we can use ref( ) to get the type. Attributes are not treated as nodes in this model, but are available through the element class's method attributes( ) as a hash. The call to contents( ) allows the routine to continue processing the element's children:

# given a node and a table, find out what the node is, add to the count,
# and recurse if necessary
#
sub tabulate {
  my( $node, $table ) = @_;

  my $type = ref( $node );
  if( $type eq 'XML::Grove::Element' ) {
    $table->{ 'element' }++;
    $table->{ 'element (' . $node->name . ')' }++;
    foreach( keys %{$node->attributes} ) {
      $table->{ "attribute ($_)" }++;
    }
    foreach( @{$node->contents} ) {
      &tabulate( $_, $table );
    }

  } elsif( $type eq 'XML::Grove::Entity' ) {
    $table->{ 'entity-ref (' . $node->name . ')' }++;

  } elsif( $type eq 'XML::Grove::PI' ) {
    $table->{ 'PI (' . $node->target . ')' }++;

  } elsif( $type eq 'XML::Grove::Comment' ) {
    $table->{ 'comment' }++;

  } else {
    $table->{ 'text-node' }++
  }
}

Here's a typical result, when run on an XML datafile:

NODES:
PI (a): 1
attribute (date): 1
attribute (style): 12
attribute (type): 2
element: 30
element (category): 2
element (inventory): 1
element (item): 6
element (location): 6
element (name): 12
element (note): 3
text-node: 100