Specific Node-Type Interfaces (XML in a Nutshell, 2nd Edition)

Though it is possible to access the data from the original XML document using only the Node interface, the DOM Core provides a number of specific node-type interfaces that simplify common programming tasks. These specific node types can be divided into two broad types: structural nodes and content nodes.

18.4.1. Structural Nodes

Within an XML document, a number of syntax structures exist that are not formally part of the content. The following interfaces provide access to the portions of the document that are not related to character or element data.

18.4.1.1. DocumentType

The DocumentType interface provides access to the XML document type definition's notations, entities, internal subset, public ID, and system ID. Since a document can have only one !DOCTYPE declaration, only one DocumentType node can exist for a given document. It is accessed via the doctype attribute of the Document interface. The definition of the DocumentType interface is shown in Table 18-6.

Table 18-6. DocumentType interface, derived from Node

Type	Name	Read-only	DOM 2.0
Attributes
NamedNodeMap	entities
DOMString	name
NamedNodeMap	notations
DOMString	publicId
DOMString	systemId

Using additional fields available from DOM Level 2, it is now possible to fully reconstruct a parsed document using only the information provided with the DOM framework. No programmatic way to modify DocumentType node contents currently exists.

18.4.1.2. ProcessingInstruction

This node type provides direct access to an XML name processing instruction's contents. Though processing instructions appear in the document's text, they may also appear before or after the root element, as well as in DTDs. Table 18-7 describes the ProcessingInstruction node's attributes.

Table 18-7. ProcessingInstruction interface, derived from Node

Type	Name	Read-only	DOM 2.0
Attributes
DOMString	data
DOMString	target

Though processing instructions resemble normal XML tags, remember that the only syntactically defined part is the target name, which is an XML name token. The remaining data (up to the terminating >) is free-form. See Chapter 17 for more information about uses (and potential misuses) of XML processing instructions.

18.4.1.3. Notation

XML notations formally declare the format for external unparsed entities and processing instruction targets. The list of all available notations is stored in a NamedNodeMap within the document's DOCTYPE node, which is accessed from the Document interface. The definition of the Notation interface is shown in Table 18-8.

Table 18-8. Notation interface, derived from Node

Type	Name	Read-only	DOM 2.0
Attributes
DOMString	publicId
DOMString	systemId

18.4.1.4. Entity

The name of the Entity interface is somewhat ambiguous, but its meaning becomes clear when it is connected with the EntityReference interface, which is also part of the DOM Core. The Entity interface provides access to the entity declaration's notation name, public ID, and system ID. Parsed entity nodes have childNodes, while unparsed entities have a notationName. The definition of this interface is shown in Table 18-9.

Table 18-9. Entity interface, derived from Node

Type	Name	Read-only	DOM 2.0
Attributes
DOMString	notationName
DOMString	publicId
DOMString	systemId

All members of this interface are read-only and cannot be modified at runtime.

18.4.2. Content Nodes

The actual data conveyed by an XML document is contained completely within the document element. The following node types map directly to the XML document's nonstructural parts, such as character data, elements, and attribute values.

18.4.2.1. Document

Each parsed document causes the creation of a single Document node in memory. (Empty Document nodes can be created through the DOMImplementation interface.) This interface provides access to the document type information and the single, top-level Element node that contains the entire body of the parsed document. It also provides access to the class factory methods that allow an application to create new content nodes that were not created by parsing a document. Table 18-10 shows all attributes and methods of the Document interface.

Table 18-10. Document interface, derived from Node

Type		Name
Attributes
DocumentType		doctype
DOMImplementation		implementation
Element		documentElement
Methods
Attr		createAttribute
	DOMString	name
Attr		createAttributeNS
	DOMString	namespaceURI
	DOMString	qualifiedName
CDATASection		createCDATASection
	DOMString	data
Comment		createComment
	DOMString	data
DocumentFragment		createDocumentFragment
Element		createElement
	DOMString	tagName
Element		createElementNS
	DOMString	namespaceURI
	DOMString	qualifiedName
EntityReference		createEntityReference
	DOMString	name
ProcessingInstruction		createProcessingInstruction
	DOMString	target
	DOMString	data
Text		createTextNode
	DOMString	data
Element		getElementById
	DOMString	elementId
NodeList		getElementsByTagName
	DOMString	tagname
NodeList		getElementsByTagNameNS
	DOMString	namespaceURI
	DOMString	localName
Node		importNode
	Node	importedNode
	Boolean	deep

The various create...( ) methods are important for applications that wish to modify the structure of a document that was previously parsed. Note that nodes created using one Document instance may only be inserted into the document tree belonging to the Document that created them. DOM Level 2 provides a new importNode( ) method that allows a node, and possibly its children, to be essentially copied from one document to another.

Besides the various node-creation methods, some methods can locate specific XML elements or lists of elements. The getElementsByTagName( ) and getElementsByTagNameNS( ) methods return a list of all XML elements with the name, and possibly namespace, specified. The getElementById( ) method returns the single element with the given ID attribute.

18.4.2.2. DocumentFragment

Applications that allow real-time editing of XML documents sometimes need to temporarily park document nodes outside the hierarchy of the parsed document. A visual editor that wants to provide clipboard functionality is one example. When the time comes to implement the cut function, it is possible to move the cut nodes temporarily to a DocumentFragment node without deleting them, rather than having to leave them in place within the live document. Then when they need to be pasted back into the document, they can be moved back. The DocumentFragment interface, derived from Node, has no interface-specific attributes or methods.

18.4.2.3. Element

Element nodes are the most frequently encountered node type in a typical XML document. These nodes are parents for the Text, Comment, EntityReference, ProcessingInstruction, CDATASection, and child Element nodes that comprise the document's body. They also allow access to the Attr objects that contain the element's attributes. Table 18-11 shows all attributes and methods supported by the Element interface.

Table 18-11. Element interface, derived from Node

Type		Name
Attributes
DOMString		tagName
Methods
DOMString		getAttribute
	DOMString	name
Attr		getAttributeNode
	DOMString	name
Attr		getAttributeNodeNS
	DOMString	namespaceURI
	DOMString	localName
DOMString		getAttributeNS
	DOMString	namespaceURI
	DOMString	localName
NodeList		getElementsByTagName
	DOMString	name
NodeList		getElementsByTagNameNS
	DOMString	namespaceURI
	DOMString	localName
Boolean		hasAttribute
	DOMString	name
Boolean		hasAttributeNS
	DOMString	namespaceURI
	DOMString	localName
Void		removeAttribute
	DOMString	name
Attr		removeAttributeNode
	Attr	oldAttr
Attr		removeAttributeNS
	DOMString	namespaceURI
	DOMString	localName
Void		setAttribute
	DOMString	name
Attr		setAttributeNode
	Attr	newAttr
Attr		setAttributeNodeNS
	Attr	newAttr
Attr		setAttributeNS
	DOMString	namespaceURI
	DOMString	qualifiedName
	DOMString	value

18.4.2.4. Attr

Since XML attributes may contain either text values or entity references, the DOM stores element attribute values as Node subtrees. The following XML fragment shows an element with two attributes:

<!ENTITY bookcase_pic SYSTEM "bookcase.gif" NDATA gif>
<!ELEMENT picture EMPTY>
<!ATTLIST picture
   src ENTITY #REQUIRED
   alt CDATA #IMPLIED>
. . .
<picture src="bookcase_pic" alt="3/4 view of bookcase"/>

The first attribute contains a reference to an unparsed entity; the second contains a simple string. Since the DOM framework stores element attributes as instances of the Attr interface, a few parsers make the contents of attributes available as actual subtrees of Node objects. In this example, the src attribute would contain an EntityReference object instance. Note that the nodeValue of the Attr node gives the flattened text value from the Attr node's children. Table 18-12 shows the attributes and methods supported by the Attr interface.

Table 18-12. Attr interface, derived from Node

Type	Name	Read-only	DOM 2.0
Attributes
DOMString	name
Element	ownerElement
Boolean	specified
DOMString	value

Besides the attribute name and value, the Attr interface exposes the specified flag that indicates whether this particular attribute instance was included explicitly in the XML document or inherited from the !ATTLIST declaration of the DTD. There is also a back pointer to the Element node that owns this attribute object.

18.4.2.5. CharacterData

Several types of data within a DOM node tree represent blocks of character data that do not include markup. CharacterData is an abstract interface that supports common text-manipulation methods that are used by the concrete interfaces Comment, Text, and CDATASection. Table 18-13 shows the attributes and methods supported by the CharacterData interface.

Table 18-13. CharacterData interface, derived from Node

Type		Name
Attributes
DOMString		data
Unsigned long		length
Methods
Void		appendData
	DOMString	arg
Void		deleteData
	Unsigned long	offset
	Unsigned long	count
Void		insertData
	Unsigned long	offset
	DOMString	arg
Void		replaceData
	Unsigned long	offset
	Unsigned long	count
	DOMString	arg

Table 18-14. Text interface, derived from CharacterData

Type		Name
Methods
Text		splitText
	Unsigned long	offset

The splitText method provides a way to split a single Text node into two nodes at a given point. This split would be useful if an editing application wished to insert additional markup nodes into an existing island of character data. After the split, it is possible to insert additional nodes into the resulting gap.

18.4.2.9. CDATASection

CDATA sections provide a simplified way to include characters that would normally be considered markup in an XML document. These sections are stored within a DOM document tree as CDATASection nodes. The CDATASection interface, derived from Text, has no interface-specific attributes or methods.

18.4. Specific Node-Type Interfaces