Many people, understandably, think of XML as the invention of an evil
genius bent on destroying humanity. The embedded markup, with its
angle brackets and slashes, is not exactly a treat for the eyes. Add
to that the business about nested elements, node types, and DTDs, and
you might cower in the corner and whimper for nice, tab-delineated
files and a split function.
Here's a little secret: writing programs to process
XML is not hard. A whole spectrum of tools that handle the mundane
details of parsing and building data structures for you is available,
with convenient APIs that get you started in a few minutes. If you
really need the complexity of a full-featured XML application, you
can certainly get it, but you don't have to. XML
scales nicely from simple to bafflingly complex, and if you deal with
XML on the simple end of the continuum, you can pick simple tools to
help you.
A typical program reads in an XML document, makes some changes, and
writes it back out to a file. XML::Simple was
created to automate this process as much as possible. One subroutine
call reads in an XML document and stores it in memory for you, using
nested hashes to represent elements and data. After you make whatever
changes you need to make, call another subroutine to print it out to
a file.
Let's try it out. As with any module, you have to
introduce XML::Simple to your program with a
use pragma like this:
use XML::Simple;
When you do this, XML::Simple exports two
subroutines into your namespace:
If you like, you can build the document from scratch by simply
creating the data structures from hashes, arrays, and strings.
You'd have to do that if you wanted to create a file
for the first time. Just be careful to avoid using circular
references, or the module will not function properly.
For example, let's say your boss is going to send
email to a group of people using the world-renowned mailing list
management application, WarbleSoft SpamChucker. Among its features is
the ability to import and export XML files representing mailing
lists. The only problem is that the boss has trouble reading
customers' names as they are displayed on the screen
and would prefer that they all be in capital letters. Your assignment
is to write a program that can edit the XML datafiles to convert just
the names into all caps.
Example 1-2. A script to capitalize customer names
# This program capitalizes all the customer names in an XML document
# made by WarbleSoft SpamChucker.
# Turn on strict and warnings, for it is always wise to do so (usually)
use strict;
use warnings;
# Import the XML::Simple module
use XML::Simple;
# Turn the file into a hash reference, using XML::Simple's "XMLin"
# subroutine.
# We'll also turn on the 'forcearray' option, so that all elements
# contain arrayrefs.
my $cust_xml = XMLin('./customers.xml', forcearray=>1);
# Loop over each customer sub-hash, which are all stored as in an
# anonymous list under the 'customer' key
for my $customer (@{$cust_xml->{customer}}) {
# Capitalize the contents of the 'first-name' and 'surname' elements
# by running Perl's built-in uc( ) function on them
foreach (qw(first-name surname)) {
$customer->{$_}->[0] = uc($customer->{$_}->[0]);
}
}
# print out the hash as an XML document again, with a trailing newline
# for good measure
print XMLout($cust_xml);
print "\n";
Running the program (a little trepidatious, perhaps, since the data
belongs to your boss), you get this output:
<opt version="3.5" timestamp="2002-05-13 15:33:45">
<customer>
<address>
<state>MI</state>
<zip>82649</zip>
<city>Meatball</city>
<street>17 Beable Ave.</street>
</address>
<first-name>JOE</first-name>
<email>i-like-cheese@jmac.org</email>
<surname>WRIGLEY</surname>
<age>42</age>
</customer>
<customer>
<address>
<state>NY</state>
<zip>83642</zip>
<city>Flangerville</city>
<street>R.F.D. 2</street>
</address>
<first-name>HENRIETTA</first-name>
<email>meowmeow@augh.org</email>
<surname>PUSSYCAT</surname>
<age>37</age>
</customer>
</opt>
Congratulations! You've written an XML-processing
program, and it worked perfectly. Well, almost perfectly. The output
is a little different from what you expected. For one thing, the
elements are in a different order, since hashes
don't preserve the order of items they contain.
Also, the spacing between elements may be off. Could this be a
problem?