XML::RSS (Perl and XML)

9.2.2. Using XML::RSS

The XML::RSS module is useful whether you're coming or going. It can parse RSS documents that you hand it, or it can help you write your own RSS documents. Naturally, you can combine these abilities to parse a document, modify it, and then write it out again; the module uses a simple and well-documented object model to represent documents in memory, just like the tree-based modules we've seen so far. You can think of this sort of XML helper module as a tricked-out version of a familiar general XML tool.

In the following examples, we'll work with a notional web log, a frequently updated and Web-readable personal column or journal. RSS lends itself to web logs, letting them quickly summarize their most recent entries within a single RSS document.

Here are a couple of web log entries (admittedly sampling from the shallow end of the concept's notional pool, but it works for short examples). First, here is how one might look in a web browser:

Oct 18, 2002 19:07:06

Today I asked lab monkey 45-X how he felt about his recent chess
victory against Dr. Baker. He responded by biting my kneecap. (The
monkey did, I mean.) I
think this could lead to a communications breakthrough. As well as
painful swelling, which is unfortunate.

Oct 27, 2002 22:56:11

On a tangential note, Dr. Xing's research of purple versus green monkey
trans-sociopolitical impact seems to be stalled, having gained no
ground for several weeks. Today she learned that her lab assistant
never mentioned on his job application that he was colorblind. Oh well.

Here it is again, as an RSS v1.0 document:

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns="http://purl.org/rss/1.0/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
 xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
>

<channel rdf:about="http://www.jmac.org/linklog/">
<title>Link's Log</title>
<link>http://www.jmac.org/linklog/</link>
<description>Dr. Lance Link's online research journal</description>
<dc:language>en-us</dc:language>
<dc:rights>Copright 2002 by Dr. Lance Link</dc:rights>
<dc:date>2002-10-27T23:59:15+05:00</dc:date>
<dc:publisher>llink@jmac.org</dc:publisher>
<dc:creator>llink@jmac.org</dc:creator>
<dc:subject>llink</dc:subject>
<syn:updatePeriod>daily</syn:updatePeriod>
<syn:updateFrequency>1</syn:updateFrequency>
<syn:updateBase>2002-03-03T00:00:00+05:00</syn:updateBase>
<items>
 <rdf:Seq>
  <rdf:li rdf:resource="http://www.jmac.org/linklog?2002-10-27#22:56:11" />
  <rdf:li rdf:resource="http://www.jmac.org/linklog?2002-10-18#19:07:06" />
 </rdf:Seq>
</items>
</channel>

<item rdf:about="http://www.jmac.org/linklog?2002-10-27#22:56:11">
<title>2002-10-27 22:56:11</title>
<link>http://www.jmac.org/linklog?2002-10-27#22:56:11</link>
<description>
Today I asked lab monkey 45-X how he felt about his recent chess
victory against Dr. Baker. He responded by biting my kneecap. (The
monkey did, I mean.) I
think this could lead to a communications breakthrough. As well as
painful swelling, which is unfortunate.
</description>
</item>

<item rdf:about="http://www.jmac.org/linklog?2002-10-18#19:07:06">
<title>2002-10-18 19:07:06</title>
<link>http://www.jmac.org/linklog?2002-10-18#19:07:06</link>
<description>
On a tangential note, Dr. Xing's research of purple versus green monkey
trans-sociopolitical impact seems to be stalled, having gained no
ground for several weeks. Today she learned that her lab assistant
never mentioned on his job application that he was colorblind. Oh well.
</description>
</item>

</rdf:RDF>

Note RSS 1.0's use of various metadata-enabling namespaces before it gets into the meat of laying out the actual content.[30] The curious may wish to point their web browsers at the URIs with which they identify themselves, since they are good little namespaces who put their documentation where their mouth is. ("dc" is the Dublin Core, a standard set of elements for describing a document's source. "syn" points to a syndication namespace -- itself a sub-project by the RSS people -- holding a handful of elements that state how often a source refreshes itself with new content.) Then the whole document is wrapped up in an RDF element.

[30]I am careful to specify the RSS version here because RSS Version .9 and 0.91 documents are much simpler in structure, eschewing namespaces and RDF-encapsulated metadata in favor of a simple list of <item> elements wrapped in an <rss> element. For this reason, many people prefer to use pre-1.0 RSS, and socially astute RSS software can read from and write to all these versions. XML::RSS can do this, and as a side effect, allows easy conversion between these different versions (given a single original document).

9.2.2.1. Parsing

Using XML::RSS to read an existing document ought to look familiar if you've read the preceding chapters, and is quite simple:

use XML::RSS;

# Accept file from user arguments
my @rss_docs = @ARGV;

# For now, we'll assume they're all files on disk...
foreach my $rss_doc (@rss_docs) {

  # First, create a new RSS object that will represent the parsed doc
  my $rss = XML::RSS->new;
  
  # Now parse that puppy
  $rss->parsefile($rss_doc);
  
  # And that's all. Do whatever else we may want here.
}

9.2.2.2. Inheriting from XML::Parser

If that parsefile method looked familiar, it had good reason: it's the same one used by grandpappy XML::Parser, both in word and deed.

XML::RSS takes direct advantage of XML::Parser's inheritability right off the bat, placing this module into its @ISA array before getting down to business with all that map definition.

It shouldn't surprise those familiar with object-oriented Perl programming that, while it chooses to define its own new method, it does little more than invoke SUPER::new. In doing so, it lets XML::Parser initialize itself as it sees fit. Let's look at some code from that module itself -- specifically its constructor, new, which we invoked in our example:

sub new {
    my $class = shift;
    my $self = $class->SUPER::new(Namespaces    => 1,
                                  NoExpand      => 1,
                                  ParseParamEnt => 0,
                                  Handlers      => { Char    => \&handle_char,
                                                     XMLDecl => \&handle_dec,
                                                     Start   => \&handle_start})
;
    bless ($self,$class);
    $self->_initialize(@_);
    return $self;
}

Note how the module calls its parent's new with very specific arguments. All are standard and well-documented setup instructions in XML::Parser's public interface, but by taking these parameters out of the user's hands and into its own, the XML::RSS module knows exactly what it's getting -- in this case, a parser object with namespace processing enabled, but not expansion or parsing of parameter entities -- and defines for itself what its handlers are.

The result of calling SUPER::new is an XML::Parser object, which this module doesn't want to hand back to its users -- doing so would diminish the point of all this abstraction! Therefore, it reblesses the object (at this point, deemed to be a new $self for this class) using the Perl-itically correct two-argument method, so that the returned object claims fealty to XML::RSS, not XML::Parser.

my %v0_9_ok_fields = ( channel => { title => '', description => '', link => '', }, image => { title => '', url => '', link => '' }, textinput => { title => '', description => '', name => '', link => '' }, items => [], num_items => 0, version => '', encoding => '' );

#!/usr/bin/perl # Turn the last 15 entries of Dr. Link's Weblog into an RSS 1.0 document, # which gets pronted to STDOUT. use warnings; use strict; use XML::RSS; use DBIx::Abstract; my $MAX_ENTRIES = 15; my ($output_version) = @ARGV; $output_version ||= '1.0'; unless ($output_version eq '1.0' or $output_version eq '0.9' or $output_version eq '0.91') { die "Usage: $0 [version]\nWhere [version] is an RSS version to output: 0.9, 0 .91, or 1.0\nDefault is 1.0\n"; } my $dbh = DBIx::Abstract->connect({dbname=>'weblog', user=>'link', password=>'dirtyape'}) or die "Couln't connect to database.\n"; my ($date) = $dbh->select('max(date_added)', 'entry')->fetchrow_array; my ($time) = $dbh->select('max(time_added)', 'entry')->fetchrow_array; my $time_zone = "+05:00"; # This happens to be where I live. :) my $rss_time = "${date}T$time$time_zone"; # base time is when I started the blog, for the syndication info my $base_time = "2001-03-03T00:00:00$time_zone"; # I'll choose to use RSS version 1.0 here, which stuffs some meta-information into # 'modules' that go into their own namespaces, such as 'dc' (for Dublin Core) or # 'syn' (for RSS Syndication), but fortunately it doesn't make defining the document # any more complex, as you can see below... my $rss = XML::RSS->new(version=>'1.0', output=>$output_version); $rss->channel( title=>'Dr. Links Weblog', link=>'http://www.jmac.org/linklog/', description=>"Dr. Link's weblog and online journal", dc=> { date=>$rss_time, creator=>'llink@jmac.org', rights=>'Copyright 2002 by Dr. Lance Link', language=>'en-us', }, syn=> { updatePeriod=>'daily', updateFrequency=>1, updateBase=>$base_time, }, ); $dbh->query("select * from entry order by id desc limit $MAX_ENTRIES"); while (my $entry = $dbh->fetchrow_hashref) { # Replace XML-naughty characters with entities $$entry{entry} =~ s/&/&/g; $$entry{entry} =~ s/</</g; $$entry{entry} =~ s/'/'/g; $$entry{entry} =~ s/"/"/g; $rss->add_item( title=>"$$entry{date_added} $$entry{time_added}", link=>"http://www.jmac.org/weblog?$$entry{date_added}#$$entry{time_added}", description=>$$entry{entry}, ); } # Just throw the results into standard output. :) print $rss->as_string;

9.2. XML::RSS

9.2.1. Introduction to RSS

9.2.2. Using XML::RSS

9.2.2.1. Parsing

9.2.2.2. Inheriting from XML::Parser

9.2.3. The Object Model

9.2.4. Input: User or File

9.2.5. Off-the-Cuff Output