More Handlers (Java & XML, 2nd Edition)

In the last chapter, I showed you the ContentHandler and ErrorHandler interfaces and briefly mentioned the EntityResolver and DTDHandler interfaces as well. Now that you've got a good understanding of SAX basics, you're ready to look at these two other handlers.[5] You'll find that you use EntityResolver every now and then (more if you're writing applications to be resold), and that the DTDHandler is something rarely ever pulled out of your bag of tricks.

4.2.1. Using an EntityResolver

The first of these new handlers is org.xml.sax.EntityResolver. This interface does exactly what it says: resolves entities (or at least declares a method that resolves entities, but you get the idea). The interface defines only a single method, and it looks like this:

public InputSource resolveEntity(String publicID, String systemID)
    throws SAXException, IOException;

You can create an implementation of this interface, and register it with your XMLReader instance (through setEntityResolver( ), not surprisingly). Once that's done, every time the reader comes across an entity reference, it passes the public ID and system ID for that entity to the resolveEntity( ) method of your implementation. Now you can change the normal process of entity resolution.

Typically, the XML reader resolves the entity through the specified public or system ID, whether it be a file, URL, or other resource. And if the return value from the resolveEntity( ) method is null, this process executes unchanged. As a result, you should always make sure that whatever code you add to your resolveEntity( ) implementation, it returns null in the default case. In other words, start with an implementation class that looks like Example 4-1.

Example 4-1. Simple implementation of EntityResolver

package javaxml2;

import java.io.IOException;

import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class SimpleEntityResolver implements EntityResolver {
    
    public InputSource resolveEntity(String publicID, String systemID)
        throws IOException, SAXException {
        
        // In the default case, return null
        return null;    
    }
}

You can compile this class with no problems, and register it with the reader implementation used in the SAXTreeViewer class within the buildTree( ) method:

        // Create instances needed for parsing
        XMLReader reader = 
            XMLReaderFactory.createXMLReader(vendorParserClass);
        ContentHandler jTreeContentHandler = 
            new JTreeContentHandler(treeModel, base, reader);
        ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( );

        // Register content handler
        reader.setContentHandler(jTreeContentHandler);

        // Register error handler
        reader.setErrorHandler(jTreeErrorHandler);
            
        // Register entity resolver
        reader.setEntityResolver(new SimpleEntityResolver( ));

        // Other instructions and parsing...

Recompiling and rerunning the example class creates no change. Of course, that's exactly what was predicted, so don't be too surprised. By always returning a null value, the process of entity resolution proceeds normally. If you don't believe that anything is happening, though, you can make this small change to echo what's going on to the system output:

    public InputSource resolveEntity(String publicID, String systemID)
        throws IOException, SAXException {
            
        System.out.println("Found entity with public ID " + publicID +
            " and system ID " + systemID);
        
        // In the default case, return null
        return null;    
    }

Recompile this class and run the sample tree viewer. Once the Swing GUI comes up, move it out of the way and check out the shell or command prompt output; it should look similar to Example 4-2.

Example 4-2. Output from SAXTreeViewer with verbose output

C:\javaxml2\build>java javaxml2.SAXTreeViewer 
    c:\javaxml2\ch04\xml\contents.xml
Found entity with public ID null and 
    system ID file:///c:/javaxml2/ch04/xml/DTD/JavaXML.dtd
Found entity with public ID null and 
    system ID http://www.newInstance.com/javaxml2/copyright.xml

As always, the line breaks are purely for display purposes. In any case, you can see that both references in the XML document, for the DTD and the OReillyCopyright entity reference, are passed to the resolveEntity( ) method.

At this point, you might be scratching your head; a DTD is an entity? The term "entity" is a bit vague as it is used in EntityResolver. Perhaps a better name would have been ExternalReferenceResolver, but that wouldn't be very fun to type. In any case, keep in mind that any external reference in your XML is going to be passed on to this method. So what's the point, you may be asking yourself. Remember the reference for OReillyCopyright, and how it accesses an Internet URL (http://www.newInstance.com/javaxml2/copyright.xml)? What if you don't have Internet access? What if you have a local copy you already downloaded, and want to save time by using that copy? What if you simply want to put your own copyright in place? All of these are viable questions, and real-world problems that you may have to solve in your applications. The answer, of course, is the resolveEntity( ) method I've been talking about.

If you return a valid InputSource (instead of null) from this method, that InputSource is used as the value for the entity reference, rather than the public or system ID specified. In other words, you can specify your own data instead of letting the reader handle resolution on its own. As an example, create a copyright.xml file on your local machine, as shown in Example 4-3.

Example 4-3. Local copy of copyright.xml

<copyright xmlns="http://www.oreilly.com">
  <year value="2001" />
  <content>This is my local version of the copyright.</content>
</copyright>

Save this in a directory you can access from your Java code (I used the same directory as my contents.xml file), and make the following change to the resolveEntity( ) method:

    public InputSource resolveEntity(String publicID, String systemID)
        throws IOException, SAXException {
         
        // Handle references to online version of copyright.xml   
        if (systemID.equals(
            "http://www.newInstance.com/javaxml2/copyright.xml")) {
            return new InputSource(
                "file:///c:/javaxml2/ch04/xml/copyright.xml");
        }
        
        // In the default case, return null
        return null;    
    }

You can see that instead of allowing resolution to the online resource, an InputSource that provides access to the local version of copyright.xml is returned. If you recompile your source file and run the tree viewer, you can visually verify that this local copy is used. Figure 4-1 shows the ora:copyright element expanded, including the local copyright document's content.

Figure 4-1. SAXTreeViewer running with local copyrights.xml

In real-world applications, this method tends to become a lengthy laundry list of if/then/else blocks, each one handling a specific system or public ID. And this brings up an important point: try to avoid this class and method becoming a kitchen sink for IDs. If you no longer need a specific resolution to occur, remove the if clause for it. Additionally, try to use different EntityResolver implementations for different applications, rather than trying to use one generic implementation for all your applications. Doing this avoids code bloat, and more importantly, speeds up entity resolution. If you have to wait for your reader to run through fifty or a hundred String.equals( ) comparisons, you can really bog down an application. Be sure to put references accessed often at the top of the if/else stack, so they are encountered first and result in quicker entity resolution.

Finally, I want to make one more recommendation concerning your EntityResolver implementations. You'll notice that I defined my implementation in a separate class file, while the ErrorHandler, ContentHandler, and (in the next section) DTDHandler implementations were in the same source file as parsing occurred in. That wasn't an accident! You'll find that the way you deal with content, errors, and DTDs is fairly static. You write your program, and that's it. When you make changes, you're making a larger rewrite, and need to make big changes anyway. However, you'll make many changes to the way you want your application to resolve entities. Depending on the machine you're on, the type of client you're deploying to, and what and where documents are available, you'll need different versions of an EntityResolver implementation. To allow for rapid changes to this implementation without causing editing or recompilation of your core parsing code, I use a separate source file for EntityResolver implementations; I suggest you do the same. And with that, you should know all that you need to know about resolving entities in your applications using SAX.

4.2. More Handlers

4.2.1. Using an EntityResolver

Example 4-1. Simple implementation of EntityResolver

Example 4-2. Output from SAXTreeViewer with verbose output

Example 4-3. Local copy of copyright.xml

Figure 4-1. SAXTreeViewer running with local copyrights.xml

4.2.2. Using a DTDHandler

Example 4-4. The DTDHandler interface

4.2.3. The DefaultHandler Class