10.2. SubclassingWhen writing XML-hacking Perl modules, another path to laziness involves standing on (and reading over) the shoulders of giants by subclassing general XML parsers as a quick way to build application-specific modules. You don't have to use object inheritance; the least complicated way to accomplish this sort of thing involves constructing a parser object in the usual way, sticking it somewhere convenient, and turning around whenever you want to do something XMLy. Here is some bogus code for you:
Choosing to subclass a parser has some bonuses, though. First, it gives your module the same basic user API as the module in question, including all the methods for parsing, which can be quite lazily useful -- especially if the module you're writing is an XML application helper module. Second, if you're using a tree-based parser, you can steal -- er, I mean embrace and extend -- that parser's data structure representation of the parsed document and then twist it to better serve your own nefarious goal while doing as little extra work as possible. This step is possible through the magic of Perl's class blessing and inheritance functionality. 10.2.1. Subclassing Example: XML::ComicsMLFor this example, we're going to set our notional MonkeyML aside in favor of the grim reality of ComicsML, a markup language for describing online comics.[39] It shares a lot of features and philosophies with RSS, providing, among other things, a standard way for comics to share web-syndication information, so a ComicsML helper module might be a boon for any Perl hacker who wishes to write programs that work with syndicated web comics.
We will go down a DOMmish path for this example and pull XML::LibXML down as our internal mechanism of choice, since it's (mostly) DOM compliant and is a fast parser. Our goal is to create a fully object-oriented API for manipulating ComicsML documents and all the major child elements within them:
Without further ado, let's start coding.
What exactly are we doing, here? So far, we declared the package to be a child of XML::LibXML (by way of the use base pragma), but then we write our own versions of its three parsing methods. All do the same thing, though: they call XML::LibXML's own method of the same name, capture the root element of the returned document tree object, and then pass it to these internal methods:
The rebless method takes an element node, peeks at its name, and sees if it appears on a hardcoded list it has of "interesting" element names. If it appears on the list, it chooses a class name for it (with the help of that silly element2class method) and reblesses it into that class. This behavior may seem irrational until you consider the fact that XML::LibXML objects are not very persistent, due to the way they are bound with the low-level, C-based structures underneath the Perly exterior. If I get a list of objects representing some node's children, and then ask for the list again later, I might not get the same Perl objects, though they'll both work (being APIs to the same structures on the C library-produced tree). This lack of persistence prevents us from, say, crawling the whole tree as soon as the document is parsed, blessing the "interesting" elements into our own ComicsML-specific classes, and calling it done. To get around this behavior, we do a little dirty work, quietly turning the Element objects that XML::LibXML hands us into our own kinds of objects, where applicable. The main advantage of this, beyond the egomaniacal glee of putting our own (class) name on someone else's work, is the fact that these reblessed objects are now subject to having some methods of our own design called on them. Now we can finally define these classes. First, we will taunt you by way of the AUTOLOAD method that exists in XML::ComicsML::Element, a virtual base class from which our "real" element classes all inherit. This glop of code lords it over all our element classes' basic child-element and attribute accessors; when called due to the invocation of an undefined method (as all AUTOLOAD methods answer to), it first checks to see if the method exists in that class's hardcoded list of legal child elements and attributes (available through the element() and attribute() methods, respectively); failing that, if the method had a name like add_foo or remove_foo, it enters either constructor or destructor mode:
Many more element classes exist in the real-life version of ComicsML -- ones that deal with people, strips within a comic, panels within a strip, and so on. Later in this chapter, we'll take what we've written here and apply it to an actual problem.
Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|