XML::Simple (Perl and XML)

6.2. XML::Simple

The simplest tree model can be found in Grant McLean's module XML::Simple. It's designed to facilitate the job of reading and saving datafiles. The programmer doesn't have to know much about XML and parsers -- only how to access arrays and hashes, the data structures used to store a document.

Example 6-1 shows a simple datafile that a program might use to store information.

Example 6-1. A program datafile

<preferences>
  <font role="default">
    <name>Times New Roman</name>
    <size>14</size>
  </font>
  <window>
    <height>352</height>
    <width>417</width>
    <locx>100</locx>
    <locy>120</locy>
  </window>
</preferences>

XML::Simple makes accessing information in the datafile remarkably easy. Example 6-2 extracts default font information from it.

Example 6-2. Program to extract font information

use XML::Simple;

my $simple = XML::Simple->new( );             # initialize the object
my $tree = $simple->XMLin( './data.xml' );   # read, store document

# test access to the tree
print "The user prefers the font " . $tree->{ font }->{ name } . " at " .
    $tree->{ font }->{ size } . " points.\n";

First we initialize an XML::Simple object, then we trigger the parser with a call to its XMLin( ) method. This step returns a reference to the root of the tree, which is a hierarchical set of hashes. Element names provide keys to the hashes, whose values are either strings or references to other element hashes. Thus, we have a clear and concise way to access points deep in the document.

To illustrate this idea, let's look at the data structure, using Data::Dumper, a module that serializes data structures. Just add these lines at the end of the program:

use Data::Dumper;
print Dumper( $tree );

And here's the output:

$tree = {
          'font' => {
                      'size' => '14',
                      'name' => 'Times New Roman',
                      'role' => 'default'
                    },
          'window' => {
                        'locx' => '100',
                        'locy' => '120',
                        'height' => '352',
                        'width' => '417'
                      }
        };

The $tree variable represents the root element of the tree, <preferences>. Each entry in the hash it points to represents its child elements, <font> and <window>, accessible by their types. The entries point to hashes representing the third tier of elements. Finally, the values of these hash items are strings, the text found in the actual elements from the file. The whole document is accessible with a simple string of hash references.

This example was not very complex. Much of the success of XML::Simple's interface is that it relies on the XML to be simple. Looking back at our datafile, you'll note that no sibling elements have the same name. Identical names would be impossible to encode with hashes alone.

Fortunately, XML::Simple has an answer. If an element has two or more child elements with the same name, it uses a list to contain all the like-named children in a group. Consider the revised datafile in Example 6-3.

Example 6-3. A trickier program datafile

<preferences>
  <font role="console">
    <size>9</size>
    <fname>Courier</fname>
  </font>
  <font role="default">
    <fname>Times New Roman</fname>
    <size>14</size>
  </font>
  <font role="titles">
    <size>10</size>
    <fname>Helvetica</fname>
  </font>
</preferences>

We've thrown XML::Simple a curve ball. There are now three <font> elements in a row. How will XML::Simple encode that? Dumping the data structure gives us this output:

$tree = {
          'font' => [
                      {
                        'fname' => 'Courier',
                        'size' => '9',
                        'role' => 'console'
                      },
                      {
                        'fname' => 'Times New Roman',
                        'size' => '14',
                        'role' => 'default'
                      },
                      {
                        'fname' => 'Helvetica',
                        'size' => '10',
                        'role' => 'titles'
                      }
                    ]
        };

Now the font entry's value is a reference to a list of hashes, each modeling one of the <font> elements. To select a font, you must iterate through the list until you find the one you want. This iteration clearly takes care of the like-named sibling problem.

This new datafile also adds attributes to some elements. These attributes have been incorporated into the structure as if they were child elements of their host elements. Name clashes between attributes and child elements are possible, but this potential problem is resolved the same way as like-named sibling elements. It's convenient this way, as long as you don't mind if elements and attributes are treated the same.

We know how to input XML documents to our program, but what about writing files? XML::Simple also has a method that outputs XML documents, XML_Out( ). You can either modify an existing structure or create a new document from scratch by building a data structure like the ones listed above and then passing it to the XML_Out( ) method.

Our conclusion? XML::Simple works well with simple XML documents, but runs into trouble with more complex markup. It can't handle elements with both text and elements as children (mixed content). It doesn't pay attention to node types other than elements, attributes, and text (like processing instructions or CDATA sections). Because hashes don't preserve the order of items, the sequence of elements may be scrambled. If none of these problems matters to you, then use XML::Simple. It will serve your needs well, minimizing the pain of XML markup and keeping your data accessible.