Advanced DOM (Java & XML, 2nd Edition)

Just like in Chapter 4, "Advanced SAX ", there's nothing mystical about anything I'll cover in this chapter. The topics build upon a foundation that I set in the DOM basics from the last chapter. However, with the exception of the first section on mutation, many of these features are rarely used. While almost everything you've seen in SAX (except, perhaps, the DTDHandler and DeclHandler) will be handy, I've found many of the fringe features of DOM useful only in specific applications. For example, if you aren't doing any presentation logic, you'll probably never touch the DOM HTML bindings. The same goes for many of DOM Level 2's features; if you need them, you need them badly, and if you don't, you really don't.

In this chapter, I'll present some specific DOM topics that will be useful in your own DOM programming. I've tried to organize the chapter more like a reference than the previous chapters; if you want to find out more about the DOM Level 2 Traversal module, for example, you can simply thumb to that section. However, the code examples in this chapter do build upon each other, so you may still want to work through each section in order to get a complete picture of the current DOM model. This results in more practical code samples, rather than useless contrived ones that won't get you anywhere. So buckle up, and let's dive a little deeper into the world of DOM.

6.1. Changes

First and foremost, I want to talk about the mutability of a DOM tree. The biggest limitation when using SAX for dealing with XML is that you cannot change any of the XML structure you encounter, at least not without using filters and writers. Those aren't intended to be used for wholesale document changes anyway, so you'll need to use another API when you want to modify XML. DOM fits the bill nicely, as it provides XML creation and modification facilities.

In working with DOM, the process of creating an XML document is quite different from changing an existing one, so I'll take them one at a time. This section gives you a fairly realistic example to mull over. If you've ever been to an online auction site like eBay, you know that the most important aspects of the auction are the ability to find items, and the ability to find out about items. These functions depend on a user entering in a description of an item, and the auction using that information. The better auction sites allow users to enter in some basic information as well as actual HTML descriptions, which means the savvy user can bold, italicize, link, and add other formatting to their items' descriptions. This provides a good case for using DOM.

6.1.1. Creating a New DOM Tree

To get started, a little bit of groundwork is needed. Example 6-1 shows a simple HTML form that takes basic information about an item to be listed on an auction site. This would obviously be dressed up more for a real site, but you get the idea.

Example 6-1. HTML input form for item listing

<html>
 <head><title>Input/Update Item Listing</title></head>
 <body>
  <h1 align="center">Input/Update Item Listing</h1>
  <p align="center">
   <form method="POST" action="/javaxml2/servlet/javaxml2.UpdateItemServlet">
    Item ID (Unique Identifier): <br />
    <input name="id" type="text" maxLength="10" /><br /><br />
    Item Name: <br />
    <input name="name" type="text" maxLength="50" /><br /><br />
    Item Description: <br />
    <textarea name="description" rows="10" cols="30" wrap="wrap" ></textarea>
    <br /><br />
    <input type="reset" value="Reset Form" />&nbsp;&nbsp;
    <input type="submit" value="Add/Update Item" />
   </form>
  </p>
 </body>
</html>

Notice that the target of this form submission is a servlet. That servlet is shown in Example 6-2. The doPost( ) method reads in these input parameters and puts their values into temporary variables. At that point, the servlet checks the filesystem for a specific file that has this information stored within it.

WARNING: For the sake of clarity, I'm dealing directly with the filesystem in this servlet. However, this is generally not a good idea. Consider using the ServletContext to get access to local resources, allowing your servlet to be distributed and modified easily depending on the server and servlet engine hosting it. That sort of detail tends to muddy examples up, so I'm keeping it simple here.

If the file doesn't exist (for a new listing, it wouldn't), it creates a new DOM tree and builds up the tree structure using the values supplied. Once that's complete, the servlet uses the DOMSerializer class (from Chapter 5, "DOM") to write the DOM tree out to the file, making it available the next time this servlet is invoked. Additionally, I've coded up a doGet( ) method; this method just displays the HTML shown in Example 6-1. I'll use this later to allow modification of item listings. For now, don't worry too much about it.

Example 6-2. The UpdateItemServlet class

package javaxml2;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

// DOM imports
import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Element;
import org.w3c.dom.Text;

// Parser import
import org.apache.xerces.dom.DOMImplementationImpl;

public class UpdateItemServlet extends HttpServlet {

    private static final String ITEMS_DIRECTORY = "/javaxml2/ch06/xml/";

    public void doGet(HttpServletRequest req, HttpServletResponse res)
        throws ServletException, IOException {

        // Get output
        PrintWriter out = res.getWriter( );
        res.setContentType("text/html");

        // Output HTML        
        out.println("<html>");
        out.println(" <head><title>Input/Update Item Listing</title></head>");
        out.println(" <body>");
        out.println("  <h1 align='center'>Input/Update Item Listing</h1>");
        out.println("  <p align='center'>");
        out.println("   <form method='POST' " +
            "action='/javaxml2/servlet/javaxml2.UpdateItemServlet'>");
        out.println("    Item ID (Unique Identifier): <br />");
        out.println("    <input name='id' type='text' maxLength='10' />" +
            "<br /><br />");
        out.println("    Item Name: <br />");
        out.println("    <input name='name' type='text' maxLength='50' />" +
            "<br /><br />");
        out.println("    Item Description: <br />");
        out.println("    <textarea name='description' rows='10' cols='30' " +
            "wrap='wrap' ></textarea><br /><br />");
        out.println("    <input type='reset' value='Reset Form' />&nbsp;&nbsp;");
        out.println("    <input type='submit' value='Add/Update Item' />");
        out.println("   </form>");
        out.println("  </p>");
        out.println(" </body>");
        out.println("</html>");
 
        out.close( );
    }

    public void doPost(HttpServletRequest req, HttpServletResponse res)
        throws ServletException, IOException {

        // Get parameter values
        String id = req.getParameterValues("id")[0];
        String name = req.getParameterValues("name")[0];
        String description = req.getParameterValues("description")[0];

        // Create new DOM tree
        DOMImplementation domImpl = new DOMImplementationImpl( );
        Document doc = domImpl.createDocument(null, "item", null);
        Element root = doc.getDocumentElement( );

        // ID of item (as attribute)
        root.setAttribute("id", id);

        // Name of item
        Element nameElement = doc.createElement("name");
        Text nameText = doc.createTextNode(name);
        nameElement.appendChild(nameText);
        root.appendChild(nameElement);

        // Description of item
        Element descriptionElement = doc.createElement("description");
        Text descriptionText = doc.createTextNode(description);
        descriptionElement.appendChild(descriptionText);
        root.appendChild(descriptionElement);

        // Serialize DOM tree
        DOMSerializer serializer = new DOMSerializer( );
        serializer.serialize(doc, new File(ITEMS_DIRECTORY + "item-" + name + 
            ".xml"));

        // Print confirmation
        PrintWriter out = res.getWriter( );
        res.setContentType("text/html");
        out.println("<HTML><BODY>Thank you for your submission. " +
            "Your item has been processed.</BODY></HTML>");
        out.close( );        
    }

}

Go ahead and compile this class. I'll walk you through it in just a moment, but ensure that you have your environment set up to include the needed classes.

NOTE: Make sure the DOMSerializer class from the last chapter is in your classpath when compiling the UpdateItemServlet class. You'll also want to add this to the classes in your servlet engine's context. In my setup, using Tomcat, my context is called javaxml2, in a directory named javaxml2 under the webapps directory. In my WEB-INF/classes directory, there is a javaxml2 directory (for the package), and then the DOMSerializer.class and UpdateItemServlet.class files are within that directory. You should also ensure that a copy of your parser's jar file (xerces.jar in my case) is in the classpath of your engine. In Tomcat, you can simply drop a copy in Tomcat's lib directory. Finally, you'll need to ensure that Xerces, and the DOM Level 2 implementation within it, is loaded before the DOM Level 1 implementation in Tomcat's parser.jar archive. Do this by renaming parser.jar to z_parser.jar. I'll explain more about this in Chapter 10, "Web Publishing Frameworks", but for now just trust me and make the change. Then restart Tomcat and everything should work.

Once you've got your servlet in place and the servlet engine started, browse to the servlet and let the GET request your browser generates load the HTML input form. Fill this form out, as I have in Figure 6-1.

Figure 6-1. Filling out the items form

Since I'll talk in depth about the description field later, I want to show you the complete content I typed into that field. I know there's lots of markup (I went crazy on the bolding and italics!), but this will be important later on:

This is a <i>beautiful</i> <b>Sitka-topped</b> guitar with <b>Indian Rosewood</b> 
back and sides. Made by luthier <a href="http://www.bourgeoisguitars.com">Dana 
Bourgeois</a>, this OM has a <b>huge sound</b>.
The guitar has <i>great action</i>, a 1 3/4" nut, and all 
<i>fossilized ivory</i> nut and saddle, with <i>ebony</i> end pins.
New condition, this is a <b>great guitar</b>!

Submitting this form posts its data (via a POST request) to the servlet, and the doPost( ) method takes effect. As for the actual DOM creation, it turns out to be pretty simple. First, you'll need to instantiate an instance of the org.w3c.dom.DOMImplementation class. This will be the base for all your DOM creation work. While you could certainly directly instantiate a DOM Document implementation, you would not be able to create a DocType class from it as you could from a DOMImplementation; using DOMImplementation is a better practice. Additionally, the DOMImplementation class has one more useful method, hasFeature( ). I'll cover this method in detail later, so don't worry about it for now. In the example code, I've used Xerces' implementation, org.apache.xerces.dom.DOMImplementationImpl (sort of a confusing name, isn't it?). There is currently no vendor-neutral way to handle this, although DOM Level 3 (covered at the end of this chapter) provides some possibilities for the future. JAXP, detailed in Chapter 9, "JAXP", offers some solutions, but I'll get to those later.

Once you've got an instance of DOMImplementation, though, things are pretty simple. Take a look at the relevant code again:

        // Create new DOM tree
        DOMImplementation domImpl = new DOMImplementationImpl( );
        Document doc = domImpl.createDocument(null, "item", null);
        Element root = doc.getDocumentElement( );

        // ID of item (as attribute)
        root.setAttribute("id", id);

        // Name of item
        Element nameElement = doc.createElement("name");
        Text nameText = doc.createTextNode(name);
        nameElement.appendChild(nameText);
        root.appendChild(nameElement);

        // Description of item
        Element descriptionElement = doc.createElement("description");
        Text descriptionText = doc.createTextNode(description);
        descriptionElement.appendChild(descriptionText);
        root.appendChild(descriptionElement);

        // Serialize DOM tree
        DOMSerializer serializer = new DOMSerializer( );
        serializer.serialize(doc, new File(ITEMS_DIRECTORY + "item-" + id + 
            ".xml"));

First, the createDocument( ) method is used to get a new Document instance. The first argument to this method is the namespace for the document's root element. I haven't gotten to the namespace yet, so I omit one by passing in a null value. The second argument is the name of the root element itself, which is simply "item". The last argument is an instance of a DocType class, and I again pass in a null value since I have none for this document. If I did want a DocType, I could create one with the createDocType( ) method on the same class, DOMImplementation. If you're interested in that method, check out the complete DOM API coverage in Appendix A, "API Reference".

With a DOM tree to operate upon, I can retrieve the root element to work with (using getDocumentElement( ), covered in the last chapter). Once I've got that, I add an attribute with the ID of the item using setAttribute( ). I pass in the attribute name and value, and the root element is ready to go. Things begin to get simple now; each type of DOM construct can be created using the Document object as a factory. To create the "name" and "description" elements, I use the createElement( ) method, simply passing in the element name in each case. The same approach is used to create textual content for each; since an element has no content but instead has children that are Text nodes (remember this from the last chapter?), the createTextNode( ) method is the right selection. This method takes in the text for the node, which works out to be the description and item name. You might be tempted to use the createCDATASection( ) method, and wrap this text in CDATA tags. There is HTML within this element. However, that would prevent the content from being read in as a set of elements, and provide the content as a big blob of text. Later on, we'll want to deal with this as elements, so leave this as a Text node instead, using createTextNode( ) again. Once you've gotten all of these nodes created, all that's left is to link them together. Your best bet is to use appendChild( ) on each, appending the elements to the root, and the textual content of the elements to the correct parent. This is pretty self-explanatory. And finally, the whole document is passed into the DOMSerializer class from the last chapter and written out to an XML file on disk.

WARNING: I have assumed that the user is entering well-formed HTML; in other words, XHTML. In a production application you would probably run this input through JTidy (http://www.sourceforge.net/projects/jtidy) to ensure this; for this example, I'll just assume the input is XHTML.

I've provided a constant in the servlet, ITEMS_DIRECTORY, where you can specify what directory to use. The example code uses a Windows directory, and notice that the backslashes are all escaped. Don't forget this! Simply change this to the directory you want to use on your system. You can view the XML generated from the servlet by browsing to the directory you specified in this constant, and open up the XML file that should be located there. Mine looked as shown in Example 6-3.

Example 6-3. The XML generated from the UpdateItemServlet

<?xml version="1.0"?>
<item id="bourgOM">
<name>Bourgeois OM Guitar</name>
<description>This is a <i>beautiful</i> <b>Sitka-topped</b> guitar with 
<b>Indian Rosewood</b> back and sides. Made by luthier 
<a href="http://www.bourgeoisguitars.com">Dana Bourgeois</a>, this OM has a 
<b>huge sound</b>. 
The guitar has <i>great action</i>, a 1 3/4" nut, and all 
<i>fossilized ivory</i> nut and saddle, with <i>ebony</i> end pins.
New condition, this is a <b>great guitar</b>!</description>
</item>

I've moved fairly quickly through this, but you should be starting to really catch your stride with DOM. Next, I want to discuss actually modifying a DOM tree that is already in existence.

6.1.2. Modifying a DOM Tree

The process of changing an existing DOM tree is slightly different from the process of creating one; in general, it involves loading the DOM from some source, traversing the tree, and then making changes. These changes are usually either to structure or content. If the change is to structure, it becomes a matter of creation again:

// Add a copyright element to the root
Element root = doc.getDocumentElement( );
Element copyright = doc.createElement("copyright");
copyright.appendChild(doc.createTextNode("Copyright O'Reilly 2001"));
root.appendChild(copyright);

This is what I just described. The process of changing existing content is a little different, although not overly complex. As an example, I will show you a modified version of the UpdateItemServlet. This version reads the supplied ID and tries to load an existing file if it exists. If so, it doesn't create a new DOM tree, but instead modifies the existing one. Since there are so many additions, I'll reprint the entire class and highlight the changes:

package javaxml2;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.xml.sax.SAXException;

// DOM imports
import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;

// Parser import
import org.apache.xerces.dom.DOMImplementationImpl;
import org.apache.xerces.parsers.DOMParser;

public class UpdateItemServlet extends HttpServlet {

    private static final String ITEMS_DIRECTORY = "/javaxml2/ch06/xml/";

    // doGet( ) method is unchanged

    public void doPost(HttpServletRequest req, HttpServletResponse res)
        throws ServletException, IOException {

        // Get parameter values
        String id = req.getParameterValues("id")[0];
        String name = req.getParameterValues("name")[0];
        String description = req.getParameterValues("description")[0];

        // See if this file exists
        Document doc = null;
        File xmlFile = new File(ITEMS_DIRECTORY + "item-" + id + ".xml");

        if (!xmlFile.exists( )) {
            // Create new DOM tree
            DOMImplementation domImpl = new DOMImplementationImpl( );
            doc = domImpl.createDocument(null, "item", null);
            Element root = doc.getDocumentElement( );

            // ID of item (as attribute)
            root.setAttribute("id", id);

            // Name of item
            Element nameElement = doc.createElement("name");
            Text nameText = doc.createTextNode(name);
            nameElement.appendChild(nameText);
            root.appendChild(nameElement);

            // Description of item
            Element descriptionElement = doc.createElement("description");
            Text descriptionText = doc.createText(description);
            descriptionElement.appendChild(descriptionText);
            root.appendChild(descriptionElement);
        } else {
            // Load document
            try {
                DOMParser parser = new DOMParser( );
                parser.parse(xmlFile.toURL().toString( ));
                doc = parser.getDocument( );

                Element root = doc.getDocumentElement( );
   
                // Name of item
                NodeList nameElements = 
                    root.getElementsByTagNameNS(docNS, "name");
                Element nameElement = (Element)nameElements.item(0);
                Text nameText = (Text)nameElement.getFirstChild( );
                nameText.setData(name);
            
                // Description of item
                NodeList descriptionElements = 
                    root.getElementsByTagNameNS(docNS, "description");
                Element descriptionElement = (Element)descriptionElements.item(0);

                // Remove and recreate description
                root.removeChild(descriptionElement);
                descriptionElement = doc.createElement("description");
                Text descriptionText = doc.createTextNode(description);
                descriptionElement.appendChild(descriptionText);
                root.appendChild(descriptionElement);
            } catch (SAXException e) {
                // Print error
                PrintWriter out = res.getWriter( );
                res.setContentType("text/html");
                out.println("<HTML><BODY>Error in reading XML: " +
                    e.getMessage( ) + ".</BODY></HTML>");
                out.close( ); 
                return;
            }
        }

        // Serialize DOM tree
        DOMSerializer serializer = new DOMSerializer( );
        serializer.serialize(doc, xmlFile);

        // Print confirmation
        PrintWriter out = res.getWriter( );
        res.setContentType("text/html");
        out.println("<HTML><BODY>Thank you for your submission. " +
            "Your item has been processed.</BODY></HTML>");
        out.close( );        
    }
}

The changes are fairly simple, nothing to throw you for a loop. I create the File instance for the named file (using the ID supplied), and check for its existence. This tells the servlet whether the XML file representing the submitted item already exists. If not, it does everything discussed in the last section, with no changes. If the XML already exists (indicating the item has already been submitted), it is loaded and read into a DOM tree using techniques covered in the last chapter. At that point, some basic tree traversal begins.

The code grabs the root element, and then uses the getElementsByTagName( ) method to locate all elements named "name" and then all named "description." In each case, I know that only one will be found within the returned NodeList. I can access this using the item( ) method on the NodeList, and supplying "0" as the argument (the indexes are all zero-based). This effectively gives me the element desired. I could have simply gotten the children of the root through getChildren( ), and peeled off the first and second. However, using the element names is easier to document and clearer. I get the "name" element's textual content by invoking getFirstChild( ). Since I know that the "name" element has a single Text node, I can directly cast this to the appropriate type. Finally, the setData( ) method allows the code to change the existing value for a new name, which is the information the user supplied through the form.

You'll notice that I used a slightly different approach for the description of the item. Since there could conceivably be a complete document fragment within the element (remember the user could enter HTML, allowing for nested elements like "b", "a", and "img"), it's easier to just remove the existing "description" element and replace it with a new one. This avoids having to recurse through the tree and remove each child node, a time-consuming task. Once I've removed the node using the removeChild( ) method, it's simple to recreate and reappend it to the document's root element.

It's no accident that this code is hardwired to the format the XML was written out to. In fact, most DOM modification code relies on at least some understanding of the content to be dealt with. For cases when the structure or format is unknown, the DOM Level 2 traversal model is a better fit; I'll cover that a little later on in this chapter. For now, accept that knowing how the XML is structured (since this servlet created it earlier on!) is a tremendous advantage. Methods like getFirstChild( ) can be used and the result cast to a specific type, rather than needing lengthy type checking and switch blocks.

Once the creation or modification is complete, the resulting DOM tree is serialized back to XML, and the process can repeat itself. I've also had to add some error handling for SAX problems resulting from the DOM parsing, but this is also nothing new after the last chapter. As an exercise, update the doGet( ) method to read in a parameter from the URL and load the XML preferences, letting the user change them on the form. For example, the URL http://localhost:8080/javaxml2/servlet/javaxml2.UpdateItemServlet?id=bourgOM would indicate that the item with the ID "bourgOM" should be loaded for editing. This is a simple change, and one you should be ready to knock out on your own by now.

Chapter 6. Advanced DOM