Example 5-6. Excel parsing program
use XML::SAXDriver::Excel;
# get the file name to process
die( "Must specify an input file" ) unless( @ARGV );
my $file = shift @ARGV;
print "Parsing $file...\n";
# initialize the parser
my $handler = new Excel_SAX_Handler;
my %props = ( Source => { SystemId => $file },
Handler => $handler );
my $driver = XML::SAXDriver::Excel->new( %props );
# start parsing
$driver->parse( %props );
# The handler package we define to print out the XML
# as we receive SAX events.
package Excel_SAX_Handler;
# initialize the package
sub new {
my $type = shift;
my $self = {@_};
return bless( $self, $type );
}
# create the outermost element
sub start_document {
print "<doc>\n";
}
# end the document element
sub end_document {
print "</doc>\n";
}
# handle any character data
sub characters {
my( $self, $properties ) = @_;
my $data = $properties->{'Data'};
print $data if defined($data);
}
# start a new element, outputting the start tag
sub start_element {
my( $self, $properties ) = @_;
my $name = $properties->{'Name'};
print "<$name>";
}
# end the new element
sub end_element {
my( $self, $properties ) = @_;
my $name = $properties->{'Name'};
print "</$name>";
}
As you can see, the handler methods look very similar to those used
in the previous SAX example. All that has changed is what we do with
the arguments. Now let's see what the output looks
like when we run it on the test file:
<doc>
<records>
<record>
<column1>baseballs</column1>
<column2>55</column2>
</record>
<record>
<column1>tennisballs</column1>
<column2>33</column2>
</record>
<record>
<column1>pingpong balls</column1>
<column2>12</column2>
</record>
<record>
<column1>footballs</column1>
<column2>77</column2>
</record>
<record>
Use of uninitialized value in print at conv line 39.
<column1></column1>
Use of uninitialized value in print at conv line 39.
<column2></column2>
</record>
</records></doc>
The driver did most of the work in creating elements and formatting
the data. All we did was output the packages it gave us in the form
of method calls. It wrapped the whole document in
<records>, making our use of
<doc> superfluous. (In the next revision of
the code, we'll make the start_document(
) and end_document( ) methods output
nothing.) Each row of the spreadsheet is encapsulated in a
<record> element. Finally, the two columns
are differentiated with <column1> and
<column2> labels. All in all, not a bad job.