public void serializeNode(Node node, Writer writer,
String indentLevel)
throws IOException {
// Determine action based on node type
switch (node.getNodeType( )) {
case Node.DOCUMENT_NODE:
break;
case Node.ELEMENT_NODE:
break;
case Node.TEXT_NODE:
break;
case Node.CDATA_SECTION_NODE:
break;
case Node.COMMENT_NODE:
break;
case Node.PROCESSING_INSTRUCTION_NODE:
break;
case Node.ENTITY_REFERENCE_NODE:
break;
case Node.DOCUMENT_TYPE_NODE:
break;
}
}
This code is fairly useless; however, it helps to see all of the DOM
node types laid out here in a line, rather than mixed in with all of
the code needed to perform actual serialization. I want to get to
that now, though, starting with the first node passed into this
method, an instance of the
Document
interface.
The PI node in the DOM is a little bit of a break from what you have
seen so far: to fit the syntax into the Node
interface model, the getNodeValue( ) method
returns all data instructions within a PI in one
String. This allows quick output of the PI;
however, you still need to use getNodeName( ) to
get the name of the PI. If you were writing an application that
received PIs from an XML document, you might prefer to use the actual
ProcessingInstruction interface; although it
exposes the same data, the method names (getTarget(
) and getData( )) are more in line with
a PI's format. With this understanding, you can add in the code
to print out any PIs in supplied XML documents:
case Node.PROCESSING_INSTRUCTION_NODE:
writer.write("<?" + node.getNodeName( ) +
" " + node.getNodeValue( ) +
"?>");
writer.write(lineSeparator);
break;
While the code to deal with PIs is perfectly workable, there is a
problem. In the case that handled document nodes, all the serializer
did was pull out the document element and recurse. The problem is
that this approach ignores any other child nodes of the
Document object, such as top-level PIs and any
DOCTYPE declarations. Those node types are
actually lateral to the document element (root
element), and are ignored. Instead of just pulling out the document
element, then, the following code serializes all
child nodes on the supplied Document object:
case Node.DOCUMENT_NODE:
writer.write("<xml version=\"1.0\">");
writer.write(lineSeparator);
// recurse on each child
NodeList nodes = node.getChildNodes( );
if (nodes != null) {
for (int i=0; i<nodes.getLength( ); i++) {
serializeNode(nodes.item(i), writer, "");
}
}
/*
Document doc = (Document)node;
serializeNode(doc.getDocumentElement( ), writer, "");
*/
break;
case Node.DOCUMENT_TYPE_NODE:
DocumentType docType = (DocumentType)node;
writer.write("<!DOCTYPE " + docType.getName( ));
if (docType.getPublicId( ) != null) {
System.out.print(" PUBLIC \"" +
docType.getPublicId( ) + "\" ");
} else {
writer.write(" SYSTEM ");
}
writer.write("\"" + docType.getSystemId( ) + "\">";
writer.write(lineSeparator);
break;
All that's left at this point is handling entities and entity
references. In this chapter, I will skim over entities and focus on
entity references; more details on entities and notations are in the
next chapter. For now, a reference can simply be output with the
& and ; characters
surrounding it:
case Node.ENTITY_REFERENCE_NODE:
writer.write("&" + node.getNodeName( ) + ";");
break;
There are a few surprises that may trip you up when it comes to the
output from a node such as this. The definition of how entity
references should be processed within DOM allows a lot of latitude,
and also relies heavily on the underlying parser's behavior. In
fact, most XML parsers have expanded and processed entity references
before the XML document's data ever makes its way into the DOM
tree. Often, when expecting to see an entity reference within your
DOM structure, you will find the text or values
referenced rather than the entity reference
itself. To test this for your parser, you'll want to run the
SerializerTest class on the contents.xml document (which I'll cover
in the next section) and see what it does with the
OReillyCopyright entity reference. In Apache, this
comes across as an entity reference, by the way.