5.5. A Handler Base ClassSAX doesn't distinguish between different elements; it leaves that burden up to you. You have to sort out the element name in the start_element( ) handler, and maybe use a stack to keep track of element hierarchy. Don't you wish there were some way to abstract that stuff? Ken MacLeod has done just that with his XML::Handler::Subs module. This module defines an object that branches handler calls to more specific handlers. If you want a handler that deals only with <title> elements, you can write that handler and it will be called. The handler dealing with a start tag must begin with s_, followed by the element's name (replace special characters with an underscore). End tag handlers are the same, but start with e_ instead of s_. That's not all. The base object also has a built-in stack and provides an accessor method to check if you are inside a particular element. The $self->{Names} variable refers to a stack of element names. Use the method in_element( $name ) to test whether the parser is inside an element named $name at any point in time. To try this out, let's write a program that does something element-specific. Given an HTML file, the program outputs everything inside an <h1> element, even inline elements used for emphasis. The code, shown in Example 5-7, is breathtakingly simple. Example 5-7. A program subclassing the handler base
Let's feed the program a test file:
This is what we get on the other side: Summary of file: [Fooby as a child] [Fooby grows up] [Fooby is in big trouble!] Even the text inside the <em> element was included, thanks to the call to in_element( ). XML::Handler::Subs is definitely a useful module to have when doing SAX processing.
Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|