22.2. Parsing XML into a DOM Tree22.2.1. ProblemYou want to use the Document Object Model (DOM) to access and perhaps change the parse tree of an XML file. 22.2.2. SolutionUse the XML::LibXML module from CPAN: use XML::LibXML; my $parser = XML::LibXML->new( ); my $dom = $parser->parse_string($XML); # or my $dom = $parser->parse_file($FILENAME); my $root = $dom->getDocumentElement; 22.2.3. DiscussionDOM is a framework of classes for representing XML parse trees. Each element is a node in the tree, with which you can do operations like find its children nodes (the XML elements in this case), add another child node, and move the node somewhere else in the tree. The parse_string, parse_file, and parse_fh (filehandle) constructors all return a DOM object that you can use to find nodes in the tree. For example, given the books XML from Example 22-1, Example 22-2 shows one way to print the titles. Example 22-2. dom-titledumper#!/usr/bin/perl -w # dom-titledumper -- display titles in books file using DOM use XML::LibXML; use Data::Dumper; use strict; my $parser = XML::LibXML->new; my $dom = $parser->parse_file("books.xml") or die; # get all the title elements my @titles = $dom->getElementsByTagName("title"); foreach my $t (@titles) { # get the text node inside the <title> element, and print its value print $t->firstChild->data, "\n"; } The getElementsByTagName method returns a list of elements as nodes within the document that have the specific tag name. Here we get a list of the title elements, then go through each title to find its contents. We know that each title has only a single piece of text, so we assume the first child node is text and print its contents. If we wanted to confirm that the node was a text node, we could say: die "the title contained something other than text!" if $t->firstChild->nodeType != 3; This ensures that the first node is of type 3 (text). Table 22-1 shows LibXML's numeric node types, which the nodeType method returns. Table 22-1. LibXML's numeric node types
You can also create and insert new nodes, or move and delete existing ones, to change the parse tree. Example 22-23 shows how you would add a randomly generated price value to each book element. Example 22-3. dom-addprice#!/usr/bin/perl -w # dom-addprice -- add price element to books use XML::LibXML; use Data::Dumper; use strict; my $parser = XML::LibXML->new; my $dom = $parser->parse_file("books.xml") or die; my $root = $dom->documentElement; # get list of all the "book" elements my @books = $root->getElementsByTagName("book"); foreach my $book (@books) { my $price = sprintf("\$%d.95", 19 + 5 * int rand 5); # random price my $price_text_node = $dom->createTextNode($price); # contents of <price> my $price_element = $dom->createElement("price"); # create <price> $price_element->appendChild($price_text_node); # put contents into <price> $book->appendChild($price_element); # put <price> into <book> } print $dom->toString; We use createTextNode and createElement to build the new price tag and its contents. Then we use appendChild to insert the tag onto the end of the current book tag's existing contents. The toString method emits a document as XML, which lets you easily write XML filters like this one using DOM. The XML::LibXML::DOM manpage gives a quick introduction to the features of XML::LibXML's DOM support and references the manpages for the DOM classes (e.g., XML::LibXML::Node). Those manpages list the methods for the objects. 22.2.4. See AlsoThe documentation for the XML::LibXML::DOM, XML::LibXML::Document, XML::LibXML::Element, and XML::LibXML::Node modules Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|