Drivers for Non-XML Sources (Perl and XML)

Example 5-6. Excel parsing program

use XML::SAXDriver::Excel;

# get the file name to process
die( "Must specify an input file" ) unless( @ARGV );
my $file = shift @ARGV;
print "Parsing $file...\n";

# initialize the parser
my $handler = new Excel_SAX_Handler;
my %props = ( Source => { SystemId => $file },
              Handler => $handler );
my $driver = XML::SAXDriver::Excel->new( %props );

# start parsing
$driver->parse( %props );

# The handler package we define to print out the XML
# as we receive SAX events.
package Excel_SAX_Handler;

# initialize the package
sub new {
    my $type = shift;
    my $self = {@_};
    return bless( $self, $type );
}

# create the outermost element
sub start_document {
    print "<doc>\n";
}

# end the document element
sub end_document {
    print "</doc>\n";
}

# handle any character data

sub characters {
    my( $self, $properties ) = @_;
    my $data = $properties->{'Data'};
    print $data if defined($data);
}

# start a new element, outputting the start tag
sub start_element {
    my( $self, $properties ) = @_;
    my $name = $properties->{'Name'};
    print "<$name>";
}

# end the new element
sub end_element {
    my( $self, $properties ) = @_;
    my $name = $properties->{'Name'};
    print "</$name>";
}

As you can see, the handler methods look very similar to those used in the previous SAX example. All that has changed is what we do with the arguments. Now let's see what the output looks like when we run it on the test file:

<doc> <records> <record> <column1>baseballs</column1> <column2>55</column2> </record> <record> <column1>tennisballs</column1> <column2>33</column2> </record> <record> <column1>pingpong balls</column1> <column2>12</column2> </record> <record> <column1>footballs</column1> <column2>77</column2> </record> <record> Use of uninitialized value in print at conv line 39. <column1></column1> Use of uninitialized value in print at conv line 39. <column2></column2> </record> </records></doc>

The driver did most of the work in creating elements and formatting the data. All we did was output the packages it gave us in the form of method calls. It wrapped the whole document in <records>, making our use of <doc> superfluous. (In the next revision of the code, we'll make the start_document( ) and end_document( ) methods output nothing.) Each row of the spreadsheet is encapsulated in a <record> element. Finally, the two columns are differentiated with <column1> and <column2> labels. All in all, not a bad job.

5.4. Drivers for Non-XML Sources

Example 5-6. Excel parsing program