19.3. FiltersA SAX filter sits in between the parser and the client application and intercepts the messages that these two objects pass to each other. It can pass these messages unchanged or modify, replace, or block them. To a client application, the filter looks like a parser, that is, an XMLReader. To the parser, the filter looks like a client application, that is, a ContentHandler. SAX filters are implemented by subclassing the org.xml.sax.helpers.XMLFilterImpl class.[8] This class implements all the required interfaces of SAX for both parsers and client applications. That is, its signature is as follows:
public class XMLFilterImpl implements XMLFilter, XMLReader, ContentHandler, DTDHandler, ErrorHandler Your own filters will extend this class and override those methods that correspond to the messages you want to filter. For example, if you wanted to filter out all processing instructions, you would write a filter that would override the processingInstruction( ) method to do nothing, as shown in Example 19-5. Example 19-5. A SAX filter that removes processing instructionsimport org.xml.sax.helpers.XMLFilterImpl; public class ProcessingInstructionStripper extends XMLFilterImpl { public void processingInstruction(String target, String data) { // Because we do nothing, processing instructions read in the // document are *not* passed to client application } } If instead you wanted to replace a processing instruction with an element whose name was the same as the processing instruction's target and whose text content was the processing instruction's data, you'd call the startElement( ), characters( ), and endElement( ) methods from inside the processingInstruction( ) method after filling in the arguments with the relevant data from the processing instruction, as shown in Example 19-6. Example 19-6. A SAX filter that converts processing instructions to elementsimport org.xml.sax.*; import org.xml.sax.helpers.*; public class ProcessingInstructionConverter extends XMLFilterImpl { public void processingInstruction(String target, String data) throws SAXException { // AttributesImpl is an adapter class in the org.xml.sax.ext package // for precisely this case. We don't really want to add any attributes // here, but we need to pass something as the fourth argument to // startElement( ). Attributes emptyAttributes = new AttributesImpl( ); // We won't use any namespace for the element startElement("", target, target, emptyAttributes); // converts String data to char array char[] text = data.toCharArray( ); characters(text, 0, text.length); endElement("", target, target); } } We used this filter before passing Example 19-2 into a program that echoes an XML document onto System.out and were a little surprised to see this come out: <xml-stylesheet>type="text/css" href="person.css"</xml-stylesheet> <person xmlns="http://xml.oreilly.com/person"> <name:name xmlns:name="http://xml.oreilly.com/name"> <name:first>Sydney</name:first> <name:last>Lee</name:last> </name:name> <assignment project_id="p2"></assignment> </person> This document is not well-formed! The specific problem is that there are two independent root elements. However, on further consideration that's really not too surprising. Well-formedness checking is normally done by the underlying parser when it reads the text form of an XML document. SAX filters should but are not absolutely required to provide well-formed XML data to client applications. Indeed, they can produce substantially more malformed data than this by including start-tags that are not matched by end-tags, text that contains illegal characters such as the formfeed or the vertical tab, and XML names that contain non-name characters such as * and §. You need to be very careful before assuming data you receive from a filter is valid or well-formed. If you want to invoke a method without filtering it or you want to invoke the same method in the underlying handler, you can prefix a call to it with the super keyword. This invokes the variant of the method from the superclass. By default, each method in XMLFilterImpl just passes the same arguments to the equivalent method in the parent handler. Example 19-7 demonstrates with a filter that changes all character data to uppercase by overriding the characters( ) method. Example 19-7. A SAX filter that converts text to uppercaseimport org.xml.sax.*; import org.xml.sax.helpers.*; public class UpperCaseFilter extends XMLFilterImpl { public void characters(char[] text, int start, int length) throws SAXException { String temp = new String(text, start, length); temp = temp.toUpperCase( ); text = temp.toCharArray( ); super.characters(text, 0, text.length); } } Actually, using a filter involves these steps:
Details can vary a little from application to application. For instance, you might install other handlers besides the ContentHandler or change the parent between documents. However, once the filter has been attached to the underlying XMLReader, you should not directly invoke any methods on this underlying parser; you should only talk to it through the filter. For example, this is how you'd use the filter in Example 19-7 to parse a document: XMLFilter filter = new UpperCaseFilter( ); filter.setParent(XMLReaderFactory.createXMLReader( )); filter.setContentHandler(yourContentHandlerObject); filter.parse(document); Notice specifically that you invoke the filter's parse( ) method, not the underlying parser's parse( ) method. Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|