Advanced JDOM (Java & XML, 2nd Edition)

Continuing with JDOM, this chapter introduces some more advanced concepts. In the last chapter, you saw how to read and write XML using JDOM, and also got a good taste of what classes are available in the JDOM distribution. In this chapter, I drill down a little deeper to see what's going on. You'll get to see some of the classes that JDOM uses that aren't exposed in common operations, and you'll start to understand how JDOM is put together. Once you've gotten that basic understanding down, I'll move on to show you how JDOM can utilize factories and your own custom JDOM implementation classes, albeit in a totally different way than DOM. That will take you right into a fairly advanced example using wrappers and decorators, another pattern for adding functionality to the core set of JDOM classes without needing an interface-based API.

8.1. Helpful JDOM Internals

The first topic I cover is the architecture of JDOM. In Chapter 7, "JDOM", I showed you a simple UML-type model of the core JDOM classes. However, if you look closely, there are probably some things in the classes that you haven't worked with, or didn't expect. I'm going to cover those particular items in this section, showing how you can get down and dirty with JDOM.

NOTE: JDOM beta 7 was released literally days before this chapter was written. In that release, the Text class was being whiteboarded, but had not been integrated in the JDOM internals. However, this process is happening very quickly, most likely before this book gets into your hands. Even if that is not the case, it will be integrated soon after, and the issues discussed here will then apply. If you have problems with the code snippets in this section, check the version of JDOM you are using, and always try to get the newest possible release.

8.1.1. The Text Class

One class you may have been a bit surprised to see in JDOM is the Text class. If you read the last chapter, you probably caught that one large difference between DOM and JDOM is that JDOM (at least seemingly) directly exposes the textual content of an element, whereas in DOM you get the child Text node and then extract its value. What actually happens, though, is that JDOM models character-based content much like DOM does architecturally; each piece of character content is stored within a JDOM Text instance. However, when you invoke getText( ) (or getTextTrim( ) or getTextNormalize( )) on a JDOM Element instance, the instance automatically returns the value(s) in its child Text nodes:

// Get textual content
String textualContent = element.getText( );

// Get textual content, with surrounding whitespace trimmed
String trimmedContent = element.getText().trim( );
// or...
String trimmedContent = element.getTextTrim( );

// Get textual content, normalized (all interior whitespace compressed to single
//   space. For example, "   this   would  be  " would be "this would be"
String normalizedContent = element.getTextNormalize( );

As a result, it commonly seems that no Text class is actually being used. The same methodology applies when invoking setText( ) on an element; the text is created as the content of a new Text instance, and that new instance is added as a child of the element. Again, the rationale is that the process of reading and writing the textual content of an XML element is such a common occurrence that it should be as simple and quick as possible.

At the same time, as I pointed out in earlier chapters, a strict tree model makes navigation over content very simple; instanceof and recursion become easy solutions for tree explorations. Therefore, an explicit Text class, present as a child (or children) of Element instances, makes this task much easier. Further, the Text class allows extension, while raw java.lang.String classes are not extensible. For all of these reasons (and several more you can dig into on the jdom-interest mailing lists), the Text class is being added to JDOM. Even though not as readily apparent as in other APIs, it is available for these iteration-type cases. To accommodate this, if you invoke getContent( ) on an Element instance, you will get all of the content within that element. This could include Comments, ProcessingInstructions, EntityRefs, CDATA sections, and textual content. In this case, the textual content is returned as one or more Text instances rather than directly as Strings, allowing processing like this:

public void processElement(Element element) {
    List mixedContent = element.getContent( );
    for (Iterator i = mixedContent.iterator(); i.hasNext( ); ) {
        Object o = i.next( );
        if (o instanceof Text) {
            processText((Text)o);
        } else if (o instanceof CDATA) {
            processCDATA((CDATA)o);
        } else if (o instanceof Comment) {
            processComment((Comment)o);
        } else if (o instanceof ProcessingInstruction) {
            processProcessingInstruction((ProcessingInstruction)o);
        } else if (o instanceof EntityRef) {
            processEntityRef((EntityRef)o);
        } else if (o instanceof Element) {
            processElement((Element)o);
        }
    }
}

public void processComment(Comment comment) {
    // Do something with comments
}

public void processProcessingInstruction(ProcessingInstruction pi) {
    // Do something with PIs
}

public void processEntityRef(EntityRef entityRef) {
    // Do something with entity references
}

public void processText(Text text) {
    // Do something with text
}

public void processCDATA(CDATA cdata) {
    // Do something with CDATA
}

This sets up a fairly simple recursive processing of a JDOM tree. You could kick it off with simply:

// Get a JDOM Document through a builder
Document doc = builder.build(input);

// Start recursion
processElement(doc.getRootElement( ));

You would handle Comment and ProcessingInstruction instances at the document level, but you get the idea here. You can choose to use the Text class when it makes sense, and not worry about it when it doesn't.

8.1.2. The EntityRef Class

Next up on the JDOM internals list is the EntityRef class. This is another class that you may not have to use much in common cases, but is helpful to know for special coding needs. This class represents an XML entity reference in JDOM, such as the OReillyCopyright entity reference in the contents.xml document I have been using in examples:

<ora:copyright>&OReillyCopyright;</ora:copyright>

This class allows for setting and retrieval of a name, public ID, and system ID, just as is possible when defining the reference in an XML DTD or schema. It can appear anywhere in a JDOM content tree, like the Elements and Text nodes. However, like Text nodes, an EntityRef class is often a bit of a pain in the normal case. For example, in the contents.xml document, modeled in JDOM, you're usually going to be more interested in the textual value of the reference (the resolved content) rather than the reference itself. In other words, when you invoke getContent( ) on the copyright Element in a JDOM tree, you'd like to get "Copyright O'Reilly, 2000" or whatever other textual value is referred to by the entity reference. This is much more useful (again, in the most common cases) than getting a no-content indicator (an empty string), and then having to check for the existence of an EntityRef. For this reason, by default, all entity references are expanded when using the JDOM builders (SAXBuilder and DOMBuilder) to generate JDOM from existing XML. You will rarely see EntityRefs in this default case, because you don't want to mess with them. However, if you find you need to leave entity references unexpanded and represented by EntityRefs, you can use the setExpandEntities( ) method on the builder classes:

// Create new builder
SAXBuilder builder = new SAXBuilder( );

// Do not expand entity references (default is to expand these)
builder.setExpandEnitites(false);

// Build the tree with EntityRef objects (if needed, of course)
Document doc = builder.build(inputStream);

In this case, you may have EntityRef instances in the tree (if you were using the contents.xml document, for example). And you can always create EntityRefs directly and place them in the JDOM tree:

// Create new entity reference
EntityRef ref = new EntityRef("TrueNorthGuitarsTagline");
ref.setSystemID("tngTagline.xml");

// Insert into the tree
tagLineElement.addContent(ref);

When serializing this tree, you get XML like this:

<guitar>
  <tagLine>&TrueNorthGuitarsTagline;</tagLine>
</guitar>

And when reading the document back in using a builder, the resulting JDOM Document would depend on the expandEntities flag. If it is set to false, you'd get the original EntityRef back again with the correct name and system ID. With this value set to false (the default), you'd get the resolved content. A second serialization might result in:

<guitar>
  <tagLine>two hands, one heart</tagLine>
</guitar>

While this may seem like a lot of fuss over something simple, it's important to realize that whether or not entities are expanded can change the input and output XML you are working with. Always keep track of how the builder flags are set, and what you want your JDOM tree and XML output to look like.

8.1.3. The Namespace Class

I want to briefly cover one more JDOM class, the Namespace class. This class acts as both an instance variable and a factory within the JDOM architecture. When you need to create a new namespace, either for an element or for searching, you use the static getNamespace( ) methods on this class:

// Create namespace with prefix
Namespace schemaNamespace = 
    Namespace.getNamespace("xsd", "http://www.w3.org/XMLSchema/2001");

// Create namespace without prefix
Namespace javaxml2Namespace =
    Namespace.getNamespace("http://www.oreilly.com/javaxml2");

As you can see, there is a version for creating namespaces with prefixes and one for creating namespaces without prefixes (default namespaces). Either version can be used, then supplied to the various JDOM methods:

// Create element with namespace
Element schema = new Element("schema", schemaNamespace);

// Search for children in the specified namespace
List chapterElements = contentElement.getChildren("chapter", javaxml2Namespace);

// Declare a new namespace on this element
catalogElement.addNamespaceDeclaration(
    Namespace.getNamespace("tng", "http://www.truenorthguitars.com"));

These are all fairly self-explanatory. Also, when XML serialization is performed with the various outputters (SAXOutputter, DOMOutputter, and XMLOutputter), the namespace declarations are automatically handled and added to the resulting XML.

One final note: in JDOM, namespace comparison is based solely on URI. In other words, two Namespace objects are equal if their URIs are equal, regardless of prefix. This is in keeping with the letter and spirit of the XML Namespace specification, which indicates that two elements are in the same namespace if their URIs are identical, regardless of prefix. Look at this XML document fragment:

<guitar xmlns="http://www.truenorthguitars.com">
  <ni:owner xmlns:ni="http://www.newInstance.com">
    <ni:name>Brett McLaughlin</ni:name>
    <tng:model xmlns:tng="http://www.truenorthguitars.com>Model 1</tng:model>
    <backWood>Madagascar Rosewood</backWood>
  </ni:owner>
</guitar>

Even though they have varying prefixes, the elements guitar, model, and backWood are all in the same namespace. This holds true in the JDOM Namespace model, as well. In fact, the Namespace class's equals( ) method will return equal based solely on URIs, regardless of prefix.

I've touched on only three of the JDOM classes, but these are the classes that are tricky and most commonly asked about. The rest of the API was covered in the previous chapter, and reinforced in the next sections of this chapter. You should be able to easily deal with textual content, entity references, and namespaces in JDOM now, converting between Strings and Text nodes, resolved content and EntityRefs, and multiple-prefixed namespaces with ease. With that understanding, you're ready to move on to some more complex examples and cases.

Chapter 8. Advanced JDOM

Contents:

8.1. Helpful JDOM Internals

8.1.1. The Text Class

8.1.2. The EntityRef Class

8.1.3. The Namespace Class