XML Document Syntax (Web Design in a Nutshell, 2nd Edition)

30.3.1. Well-Formed XML

Browsers often recover from sloppily written or illegal HTML. This is not the case with XML documents. Because XML languages vary, the rules for coding the document need to be followed to the letter in order to ensure proper interpretation by the XML client. When a document follows the XML markup rules, it is said to be well-formed.

The primary rules for a well-formed XML document are:

There may be no white space (character spaces or line returns) before the XML declaration.

All element attribute values must be in quotation marks (either single or double quotes).

Tags and attributes are case-sensitive; for example, <par>, <PAR>, and <Par> are considered to be three different tags.

An element must have both an opening and closing tag, unless it is an empty element.

If a tag is a standalone empty element, it must contain a closing slash before the end of the tag (for example, <img/>)

All opening and closing tags must nest correctly and not overlap.

The document must have a single root element, a unique element that encloses the entire document. The root element may be used only once in the document.

Isolated markup characters (e.g., <, &, and >) are not allowed in text; use a the equivalent standard character entities instead. Table 30-1 lists the predefined character entities in XML.

Table 30-1. Predefined character entities in XML

Entity	Char	Notes
`&`	`&`	Must not be used inside processing instructions
`<`	`<`	Use inside attribute values quoted with `"`
`>`	`>`	Use after `]]` in normal text and inside processing instructions
`"`	`"`
`'`	`'`	Use inside attribute values quoted with `'`

You can check whether the syntax of your XML document is correct using a well-formedness checker (also called a nonvalidating parser). Parsers are built into Netscape 6 and Internet Explorer 5.5. You may also want to check out the list of nonvalidating parsers provided by the Web Developer's Virtual Library at http://wdvl.com/Software/XML/parsers.html.

30.3.2. Namespaces

With XML, your document may use tags that come from different "types" of XML documents. For example, you might have an XHTML document that contains some math expressions written using the MathML XML dialect. But in this case, how can you differentiate between an <a> tag coming from XHTML (an anchor) and an <a> tag that might come from MathML (an absolute value)?

The W3C anticipated such "collisions" and responded by creating the namespace convention. A namespace is a group of element and attribute names that is unique for each XML dialect. Namespaces take names that look just like URLs (they are not links to actual documents, however) to ensure uniqueness and provide information about the organization that maintains the namespace. When you reference elements and attributes in your document, the browser looks them up in the namespace to find out how they should be used.

Namespaces are declared in an XML document using the xmlns attribute. You can establish the namespace for a whole document or an individual element. Typically, the value of the xmlns attribute is a reference to the URL-like namespace. This example establishes the default namespace for the document to be transitional XHTML:

<html xmlns="http://www.w3.org/1999/xhtml">

If you need to include math markup, you can apply the xmlns attribute within the specific tag, so the browser knows to look up the element in the MathML DTD (not XHTML):

<div xmlns="http://www.w3.org/1998/Math/MathML">46/100</div>

If you plan to refer to a namespace repeatedly within a document, you can declare the namespace and give it a label just once at the beginning of the document. Then refer to it in each tag by placing the label before the tag name, separated by a colon (:). For example:

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:math="http://www.w3.org/1998/Math/MathML">

The full namespace can now be shortened to math later in the document. The result is much tidier code (and smaller file sizes!):

<math:div>46/100</math:div>

30.3. XML Document Syntax

30.3.1. Well-Formed XML

Table 30-1. Predefined character entities in XML

30.3.2. Namespaces