Chapter 25. SAX Reference
SAX, the Simple
API for XML, is a straightforward, event-based API used to parse XML
documents. David Megginson, SAX's
original author, placed SAX in the public domain. SAX is bundled with
all parsers that implement the API, including Xerces, MSXML, Crimson,
the Oracle XML Parser for Java, and Ælfred. However, you
can also get it and the full source code from
http://sax.sourceforge.net/.
SAX was originally defined as a Java API and is intended primarily
for parsers written in Java, so this chapter will focus on its Java
implementation. However, its port to other object-oriented languages,
such as C++, Python, Perl, and Eiffel, is common and usually quite
similar.
TIP:
This chapter
covers SAX2 exclusively. In 2002, all major parsers that support SAX
support SAX2. The major change from SAX1 to SAX2 was the addition of
namespace support. This addition necessitated changing the names and
signatures of almost every method and class in SAX. The old SAX1
methods and classes are still available, but they're
now deprecated and shouldn't be used.
25.1. The org.xml.sax Package
The org.xml.sax
package contains the core interfaces
and classes that comprise the Simple API for XML.
The ContentHandler Interface | |
ContentHandler is
the key piece of SAX. Almost every SAX
program needs to use this interface.
ContentHandler is a callback interface. An
instance of this interface is passed to the parser via the
setContentHandler( )
method of XMLReader. As the parser reads the
document, it invokes the methods in its
ContentHandler to tell the program
what's in the document:
package org.xml.sax;
public interface ContentHandler {
public void setDocumentLocator(Locator locator);
public void startDocument( ) throws SAXException;
public void endDocument( ) throws SAXException;
public void startPrefixMapping(String prefix, String uri)
throws SAXException;
public void endPrefixMapping(String prefix) throws SAXException;
public void startElement(String namespaceURI, String localName,
String qualifiedName, Attributes atts) throws SAXException;
public void endElement(String namespaceURI, String localName,
String qualifiedName) throws SAXException;
public void characters(char[] text, int start, int length)
throws SAXException;
public void ignorableWhitespace(char[] text, int start, int length)
throws SAXException;
public void processingInstruction(String target, String data)
throws SAXException;
public void skippedEntity(String name) throws SAXException;
}
The XMLReader
interface represents the
XML parser that reads XML
documents. You generally do not implement this interface yourself.
Instead, use the
org.xml.sax.helpers.XMLReaderFactory class to
build a parser-specific implementation. Then use this
parser's various setter methods to configure the
parsing process. Finally, invoke the parse( )
method to read the document, while calling back to methods in your
own implementations of ContentHandler,
ErrorHandler, EntityResolver,
and DTDHandler as the document is read:
package org.xml.sax;
public interface XMLReader {
public boolean getFeature(String name)
throws SAXNotRecognizedException, SAXNotSupportedException;
public void setFeature(String name, boolean value)
throws SAXNotRecognizedException, SAXNotSupportedException;
public Object getProperty(String name)
throws SAXNotRecognizedException, SAXNotSupportedException;
public void setProperty(String name, Object value)
throws SAXNotRecognizedException, SAXNotSupportedException;
public void setEntityResolver(EntityResolver resolver);
public EntityResolver getEntityResolver( );
public void setDTDHandler(DTDHandler handler);
public DTDHandler getDTDHandler( );
public void setContentHandler(ContentHandler handler);
public ContentHandler getContentHandler( );
public void setErrorHandler(ErrorHandler handler);
public ErrorHandler getErrorHandler( );
public void parse(InputSource input) throws IOException, SAXException;
public void parse(String systemID) throws IOException, SAXException;
}
Most exceptions thrown
by
SAX methods are instances of the SAXException
class or one of its subclasses. The single exception to this rule is
the parse( ) method of
XMLReader, which may throw a raw
IOException if a purely I/O-related error occurs,
for example, if a socket is broken before the parser finishes reading
the document from the network.
Besides the usual exception methods, such as getMessage(
) and printStackTrace( ), that
SAXException inherits from or overrides in its
superclasses, SAXException adds a
getException( ) method
to return the nested exception that caused the
SAXException to be thrown in the first place:
package org.xml.sax;
public class SAXException extends Exception {
public SAXException(String message);
public SAXException(Exception ex);
public SAXException(String message, Exception ex);
public String getMessage( );
public Exception getException( );
public String toString( );
}
If the parser detects a well-formedness
error while reading a document, it throws
a SAXParseException, a subclass of
SAXException.
SAXParseException s are also passed as arguments to
the methods of the ErrorHandler interface, where
you can decide whether you want to throw them.
Besides the methods it inherits from its superclasses, this class
adds methods to get the line number, column number, system ID, and
public ID of the document where the error was detected:
package org.xml.sax;
public class SAXParseException extends SAXException {
public SAXParseException(String message, Locator locator);
public SAXParseException(String message, Locator locator,
Exception e);
public SAXParseException(String message, String publicID,
String systemID, int lineNumber, int columnNumber);
public SAXParseException(String message, String publicID,
String systemID, int lineNumber, int columnNumber, Exception e);
public String getPublicId( );
public String getSystemId( );
public int getLineNumber( );
public int getColumnNumber( );
}
 |  |  | 24.2. Object Reference |  | 25.2. The org.xml.sax.helpers Package |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|