XML Reference (Webmaster in a Nutshell, 3rd Edition)

10.2. XML Reference

Now that you have had a quick taste of working with XML, here is an overview of the more common rules and constructs of the XML language.

10.2.1. Well-Formed XML

These are the rules for a well-formed XML document:

All element attribute values must be in quotation marks.
An element must have both an opening and a closing tag, unless it is an empty element.
If a tag is a standalone empty element, it must contain a closing slash (/) before the end of the tag.
All opening and closing element tags must nest correctly.
Isolated markup characters are not allowed in text; < or & must use entity references. In addition, the sequence ]]> must be expressed as ]]> when used as regular text. (Entity references are discussed in further detail later.)
Well-formed XML documents without a corresponding DTD must have all attributes of type CDATA by default.

10.2.2. Special Markup

XML uses the following special markup constructs.

<?xml ...?>

Although they are not required to, XML documents typically begin with an XML declaration, which must start with the characters <?xml and end with the characters ?>. Attributes include:

Attributes

version: The version attribute specifies the correct version of XML required to process the document, which is currently 1.0. This attribute cannot be omitted.
encoding: The encoding attribute specifies the character encoding used in the document (e.g., UTF-8 or iso-8859-1). UTF-8 and UTF-16 are the only encodings that an XML processor is required to handle. This attribute is optional.
standalone: The optional standalone attribute specifies whether an external DTD is required to parse the document. The value must be either yes or no (the default). If the value is no or the attribute is not present, a DTD must be declared with an XML <!DOCTYPE> instruction. If it is yes, no external DTD is required.

For example:

<?xml version="1.0"?>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml version="number"
[encoding="encoding"]
[standalone="yes|no"] ?>

<?...?>

A processing instruction allows developers to place attributes specific to an outside application within the document. Processing instructions always begin with the characters <? and end with the characters ?>. For example:

<?works document="hello.doc" data="hello.wks"?>

You can create your own processing instructions if the XML application processing the document is aware of what the data means and acts accordingly.

<?target attribute1="value"
attribute2="value" 
... ?>

<!DOCTYPE>

The <!DOCTYPE> instruction allows you to specify a DTD for an XML document. This instruction currently takes one of two forms:

<!DOCTYPE root-element SYSTEM "URI_of_DTD">
<!DOCTYPE root-element PUBLIC "name" "URI_of_DTD">

Keywords

SYSTEM

The SYSTEM variant specifies the URI location of a DTD for private use in the document. For example:

<!DOCTYPE Book SYSTEM
   "http://mycompany.com/dtd/mydoctype.dtd">

PUBLIC

The PUBLIC variant is used in situations in which a DTD has been publicized for widespread use. In these cases, the DTD is assigned a unique name, which the XML processor may use by itself to attempt to retrieve the DTD. If this fails, the URI is used:

<!DOCTYPE Book PUBLIC "-//O'Reilly//DTD//EN"
   "http://www.oreilly.com/dtd/xmlbk.dtd">

Public DTDs follow a specific naming convention. See the XML specification for details on naming public DTDs.

<!DOCTYPE root-element SYSTEM|PUBLIC
["name"] "URI_of_DTD">

<!— ... —>

You can place comments anywhere in an XML document, except within element tags or before the initial XML processing instructions. Comments in an XML document always start with the characters . In addition, they may not include double hyphens within the comment. The contents of the comment are ignored by the XML processor. For example:

<!-- Sales Figures Start Here -->
<Units>2000</Units>
<Cost>49.95</Cost>
<!-- comments -->

CDATA

You can define special sections of character data, or CDATA, which the XML processor does not attempt to interpret as markup. Anything included inside a CDATA section is treated as plain text. CDATA sections begin with the characters <![CDATA[ and end with the characters ]]>. For example:

<![CDATA[
   Im now discussing the <element> tag of documents
   5 & 6: "Sales" and "Profit and Loss". Luckily,
   the XML processor wont apply rules of formatting
   to these sentences!
]]>

Note that entity references inside a CDATA section will not be expanded.

<![CDATA[ ... ]]>

<para> Elements can contain text, other elements, or a combination. For example, a chapter might contain a title and multiple paragraphs, and a paragraph might contain text and <emphasis>emphasis elements</emphasis>. </para>

Example

Comment

<Italic>

Legal

<_Budget>

Legal

<Punch line>

Illegal: has a space

<205Para>

Illegal: starts with number

<repair@log>

Illegal: contains @ character

<xmlbob>

Illegal: starts with xml

10.2. XML Reference

10.2.1. Well-Formed XML

10.2.2. Special Markup

Attributes

Keywords

10.2.3. Element and Attribute Rules

10.2.4. XML Reserved Attributes