A Comics Index (Perl and XML)

10.4. A Comics Index

XSLT is one thing, but the potential for Perl, XML, and the Web working together is as unlimited as, well, anything else you might choose to do with Perl and the Web. Sometimes you can't just toss refactored XML at your clients, but must write Perl that wrings interesting information out of XML documents and builds something Webbish out of the results. We did a little of that in the previous example, mixing the raw XSLT usage when transforming the DocBook documents with index page generation.

Since we've gone through all the trouble of covering syndication-enabling XML technologies such as RSS and ComicsML in this chapter and Chapter 9, "RSS, SOAP, and Other XML Applications ", let's write a little program that uses web syndication. To prove (or perhaps belabor) a point, we'll construct a simple CGI program that builds an index of the user's favorite online comics (which, in our fantasy world, all have ComicsML documents associated with them):

#!/usr/bin/perl

# A very simple ComicsML muncher; given a list of URLs pointing to
# ComicsML documents, fetch them, flatten their strips into one list,
# and then build a web page listing, linking to, and possibly
# displaying these strips, sorted with newest first.

use warnings;
use strict;

use XML::ComicsML;                # ...so that we can build ComicsML objects
use CGI qw(:standard);
use LWP;
use Date::Manip;             # Cuz we're too bloody lazy to do our own date math

# Let's assume that the URLs of my favorite Internet funnies' ComicsML
# documents live in a plaintext file on disk, with one URL per line
# (What, no XML? For shame...)

my $url_file = $ARGV[0] or die "Usage: $0 url-file\n";

my @urls;                        # List of ComicsML URLs
open (URLS, $url_file) or die "Can't read $url_file: $!\n";
while (<URLS>) { chomp; push @urls, $_; }
close (URLS) or die "Can't close $url_file: $!\n";

# Make an LWP user agent
my $ua = LWP::UserAgent->new;
my $parser = XML::ComicsML->new;

my @strips; # This will hold objects representing comic strips

foreach my $url (@urls) {
  my $request = HTTP::Request->new(GET=>$url);
  my $result = $ua->request($request);
  my $comic;                        # Will hold the comic we'll get back
  if ($result->is_success) {
    # Let's see if the ComicsML parser likes it.
    unless ($comic = $parser->parse_string($result->content)) {
      # Doh, this is not a good XML document.
      warn "The document at $url is not good XML!\n";
      next;
    }
  } else {
    warn "Error at $url: " . $result->status_line . "\n";
    next;
  }
  # Now peel all the strips out of the comic, pop each into a little
  # hashref along with some information about the comic itself.
  foreach my $strip ($comic->strips) {
    push (@strips, {strip=>$strip, comic_title=>$comic->title, comic_url=>$comic->url});
  }
}

# Sort the list of strips by date.  (We use Date::Manip's exported
# UnixDate function here, to turn their unweildy Gregorian calendar
# dates into nice clean Unixy ones)
my @sorted = sort {UnixDate($$a{strip}->date, "%s") <=> UnixDate($$b{strip}->date, "%s")} @strips;

# Now we build a web page!

print header;
print start_html("Latest comix");
print h1("Links to new comics...");

# Go through the sorted list in reverse, to get the newest at the top.
foreach my $strip_info (reverse(@sorted)) {
  my ($title, $url, $svg);
  my $strip = $$strip_info{strip};
  $title = join (" - ", $strip->title, $strip->date);
  # Hyperlink the title to a URL, if there is one provided
  if ($url = $strip->url) {
    $title = "<a href='$url'>$title</a>";
  }

  # Give similar treatment to the comics' title and URL
  my $comic_title = $$strip_info{comic_title};
  if ($$strip_info{comic_url}) {
    $comic_title = "<a href='$$strip_info{comic_url}'>$comic_title</a>";
  }

  # Print the titles
  print p("<b>$comic_title</b>: $title");
  
  print "<hr />";
}

print end_html;

Given the trouble we went through with that Apache::DocBook trifle a little earlier, this program might seem a tad too simple; it performs no caching, it contains no governors for how many strips it will process, and its sense of web page layout isn't much to write home about. For a quick hack, though, it works great and demonstrates the benefit of using helper modules like XML::ComicsML.

Our whirlwind tour of the world of Perl and XML ends here. As we said at the start of this book, the relationship between these two technologies is still young, and it only just started to reach for its full potential while we were writing this book, as new parsers like XML::LibXML and philosophies like PerlSAX2 emerged onto the scene during that time. We hope that we have given you enough information and encouragement to become part of this scene, as it will continue to unfold in increasingly interesting directions in the coming years.

<aloha>Happy hacking!</aloha>