20.3. XML Syntax
For each section of this reference that maps directly to an XML
language structure, an informal syntax reference describes theat
structure's form. The following conventions are used
with these syntax blocks:
Format
|
Meaning
|
DOCTYPE
|
Bold text indicates literal characters that must appear as written
within the document (e.g., DOCTYPE).
|
encoding-name
|
Italicized text indicates that the user must replace the text with
real data. The item indicates what type of data should be inserted
(e.g., encoding-name = en-us).
|
|
|
The vertical bar | indicates that only one out of a list of possible
values can be selected.
|
[ ]
|
Square brackets indicate that a particular portion of the syntax is
optional.
|
20.3.1. Global Syntax Structures
Every XML document is broken into two primary sections: the
prolog and the document
element. A few documents may also have comments
or processing instructions that follow the root element in a sort of
epilog (an unofficial term). The prolog contains
structural information about the particular type of XML document you
are writing, including the XML declaration and document type
declaration. The prolog is optional, and if a document does not need
to be validated against a DTD, it can be omitted completely. The only
required structure in a well-formed XML document is the top-level
document element itself.
The following syntax structures are common to the
entire XML document. Unless otherwise noted within a subsequent
reference item, the following structures can appear anywhere within
an XML document.
Whitespace
is defined as a space, tab, or empty line
(which is composed of a carriage return, line feed, or combination of
the two). Whitespace serves the same purpose in XML as it does in
most programming and natural languages: to separate tokens and
language elements from one another. XML has simplified the task of
determining which whitespace is significant to an application and
which is not. To an XML parser, all whitespace in element content is
significant and will be passed to the client application. Whitespace
within tags--for instance, between attributes--is not
significant. Consider the following example:
<p> This sentence has extraneous
line breaks.</p>
After parsing, the character data from this example element is passed
to the underlying application as:
This sentence has extraneous
line breaks.
Though XML specifies that all whitespace in element content be
preserved for use by the client application, an additional facility
is available to the XML author to further hint that an element's character data's space
and formatting should be preserved. For more information, see the
discussion of the xml:space attribute in Special Attributes later in this
chapter.
To
ease
the burden of those who write XML parsers, XML names must adhere to
the following lexical conventions:
-
Begin with a letter, _, or :
character.
-
After the first character, be composed only of letters, digits,
., -, _, and
: characters.
In this context, a letter is any Unicode character that matches the
Letter production from the EBNF grammar at the end
of this chapter.
According to the XML 1.0 specification, the :
character may be used freely within names,
although the character is now officially reserved as part of the
Namespaces in XML recommendation. Even if a document
does not use namespaces, the colon should still not be used within
identifiers to maintain compatibility with namespace-aware parsers.
See the Section 20.3.4 in
this chapter for more information about how namespace-aware
identifiers are formed.
Names should also avoid starting with the three-letter sequence X, M,
L, unless specifically sanctioned by an XML specification.
&#decimal-number;
&#xhexadecimal-number;
| |
All XML parsers are based on the Unicode character set, no matter
what the external encoding of the XML file is. It is theoretically
possible to author documents directly in Unicode, but many
text-editing, storage, and delivery systems still use the ASCII
character set. To allow XML authors to include Unicode characters in
their documents' content without forcing them to
abandon their existing editing tools, XML provides the
character reference mechanism.
A character reference allows an author to insert a
Unicode character by number
into the output stream produced by the parser to an XML application.
Consider an XML document that includes the following character data:
© 2002 O'Reilly & Associates
In this example, the parser would replace the character reference
with the actual Unicode character and pass it to the client
application:
© 2002 O'Reilly & Associates
Character references may not be used in element or attribute names,
though they may be used in attribute values.
Besides
user-defined entity references, XML includes the five named entity
references shown in Table 20-1 that can be used
without being declared. These references are a subset of those
available in HTML documents.
Table 20-1. Predefined entities
Entity
|
Character
|
XML declaration
|
<
|
<
|
<!ENTITY lt "&#60;">
|
>
|
>
|
<!ENTITY gt ">">
|
&
|
&
|
<!ENTITY amp "&#38;">
|
'
|
"
|
<!ENTITY apos "'">
|
"
|
"
|
<!ENTITY quot """>
|
The < and &
entities must be used wherever < or
& appear in document content. The
> entity is frequently used wherever
> appears in document content, but is only
mandatory to avoid putting the sequence ]]>
into content. ' and
" are generally used only within
attribute values to avoid conflicts between the value and the quotes
used to contain the value.
Though the parser must recognize these entities regardless of whether
they have been declared, you can declare them in your DTD without
generating errors.
The presence of these "special"
predefined entities creates a conundrum within an XML document.
Because it is possible to use these references without declaring
them, it is possible to have a valid XML document that includes
references to entities that were never declared. The XML
specification actually encourages document authors to declare these
entities to maintain the integrity of the entity
declaration-reference rule. In practical terms, declaring these
entities only adds unnecessary complexity to your document.
CDATA (Character Data) Sections | |
<![CDATA[unescaped character & markup data]]>
| |
XML documents consist of markup
and character data. The < or
& characters cannot be included inside normal
character data without using a character or entity reference, such as
& or &. By
using a reference, the resulting < and
& characters are not recognized as markup by
the parser, but will become part of the data stream to the
parser's client application.
For large blocks of character data--particularly if the data
contains markup, such as an HTML or XML fragment--the
CDATA section can be used. Within a
CDATA block, every character between the opening
and closing tag is considered character data. Thus, special
characters can be included in a CDATA section with
impunity, except for the CDATA closing sequence,
]]>.
CDATA sections are very useful for tasks such as
enclosing XML or HTML documents inside of tutorials explaining how to
use markup, but it is difficult to process the contents of
CDATA sections using XSLT, the DOM, or SAX as
anything other than text.
NOTE:
CDATA sections cannot be nested. The character
sequence ]]> cannot appear within data that is
being escaped, or the CDATA block will be closed
prematurely. This situation should not be a problem ordinarily, but
if an application includes XML documents as unparsed character data,
it is important to be aware of this constraint. If it is necessary to
include the CDATA closing sequence in the data,
close the open CDATA section, include the closing
characters using character references to escape them, then reopen the
CDATA section to contain the rest of the character
data.
An XML entity can best be
understood as a macro replacement facility, in which the replacement
can be either parsed (the text becomes part of the XML document) or
unparsed. If unparsed, the entity declaration points to external
binary data that cannot be parsed. Additionally, the replacement text
for parsed entities can come from a string or the contents of an
external file. During parsing, a parsed entity reference is replaced
by the substitution text that is specified in the entity declaration.
The replacement text is then reparsed until no more entity or
character references remain.
To simplify document parsing, two distinct types
of entities are used in different situations: general and parameter.
The basic syntax for referencing both entity types is almost
identical, but specific rules apply to where each type can be used.
Parameter Entity References |
|
When an XML parser encounters a parameter entity reference within a
document's DTD, it replaces the reference with the
entity's text. Whether the replacement text is
included as a literal or included from an external entity, the parser
continues parsing the replacement text as if it had always been a
part of the document. This parsing has interesting implications for
nested entity references:
<!ENTITY % YEAR "2001">
<!ENTITY COPYRIGHT "© %YEAR;">
. . .
<copyright_notice>©RIGHT;</copyright_notice>
After the necessary entity replacements are made, the previous
example would yield the following canonical element:
<copyright_notice>© 2001</copyright_notice>
WARNING:
XML treats parameter entity references
differently depending on where they appear within the DTD. References
within the literal value of an entity declaration (such as
Copyright © %YEAR;) are valid only as
part of the external subset. Within the internal subset, parameter
entity references may occur only where a complete markup declaration
could exist. In other words, within the internal subset, parameter
references can be used only to include complete markup declarations.
Parameter entity references are recognized only within the DTD;
therefore, the % character has no significance
within character data and does not need to be escaped.
General Entity References | |
General entity references
are recognized only within the parsed character data in the body of
an XML document. They may appear within the parsed character data
contained in an element start- and end-tag, or within the value of an
attribute. They are not recognized within a
document's DTD (except inside default values for
attributes) or within CDATA sections.
NOTE:
The sequence of operations that occurs when a parsed general entity
is included by the XML parser can lead to interesting side effects.
An entity's replacement text is, in turn, read by
the parser. If character or general entity replacements exist in the
entity replacement text, they are also parsed and included as parsing
continues.
Comments can appear anywhere in your document or DTD, outside of
other markup tags. XML parsers are not required to preserve contents
of comment blocks, so they should be used only to store information
that is not a part of your application. In reality, most information
you might consider storing in a comment block probably should be made
an official part of your XML application. Rather than storing data
that will be read and acted on by an application in a comment, as is
frequently done in HTML documents, you should store it within the
element structure of the actual XML document. Enhancing the
readability of a complex DTD or temporarily disabling blocks of
markup are effective uses of comments.
NOTE:
The character sequence -- cannot be included
within a comment block, except as part of the tag closing text.
Because comments cannot be nested, commenting out a comment block is
impossible. If large blocks of markup that include comments must be
temporarily disabled, consider wrapping them in a
CDATA section to cause the parser to read them as
simple text instead of markup.
<?target [processing-instruction data]?>
| |
Processing instructions provide an escape mechanism that allows an
XML application to include instructions to an XML processor that are
not part of the XML markup or character data. The processing
instruction target can be any legal XML name, except
xml in any combination of upper- and lowercase
(see Chapter 2). Linking to a stylesheet to
provide formatting instructions for a document is a common use of
this mechanism. According to the principles of XML, formatting
instructions should remain separate from the actual content of a
document, but some mechanism must associate the two. Processing
instructions are significant only to applications that recognize
them.
The notation facility can indicate exactly what type of processing
instruction is included, and each individual XML application must
decide what to do with the additional data. No action is required by
an XML parser when it recognizes that a particular processing
instruction matches a declared notation. When this facility is used,
applications that do not recognize the public or system identifiers
of a given processing instruction target should realize that they
could not properly interpret its data portion.
Character Encoding Autodetection
The XML declaration must be the very first item in
a document so that the XML parser can determine which character
encoding was used to store the document. A chicken-and-egg problem
exists, involving the XML declaration's
encoding="..." clause: the parser
can't parse the clause if it
doesn't know what character encoding the document
uses. However, since the first five characters of your document must
be the string <?xml (if it includes an XML
declaration), the parser can read the first few bytes of your
document and, in most cases, determine the character encoding before
it has read the encoding declaration.
|
<?xml version="1.0" [encoding="encoding-name"][ standalone="yes|no"]?>
| |
The XML declaration serves several purposes. It tells the parser what
version of the specification was used, how the document is encoded,
and whether the document is completely self-contained or has
references to external entities.
The XML declaration, if included, must be the first thing that
appears in an XML document. Nothing, except possibly a Unicode
byte-order mark, may appear before this structure's
initial < character.
The version information attribute denotes
which version of the XML specification was used to create the current
document. At this time, the only valid version is
1.0.
... encoding="encoding-name" ... | |
The
encoding declaration, if present, indicates which character-encoding
scheme was used to store the document. Although all XML documents are
ultimately handled as Unicode by the parser, the external storage
scheme may be anything from an ASCII text file using the Latin-1
character set (ISO-8859-1) to a file with native Japanese characters.
XML parsers may also recognize other encodings, but the XML
specification only requires that they recognize
UTF-8 and
UTF-16 encoded documents. Many
parsers also support additional character encodings. For a thorough
discussion of character-encoding schemes, see Chapter 26.
... standalone="yes|no" ... | |
If
a document is completely self contained (the DTD, if there is one, is
contained completely within the original document), then the
standalone="yes" declaration may be used. If this
declaration is not given, the value no is assumed,
and all external entities are read and parsed. It is possible to
convert any document in which standalone="no" to a
standalone document by replacing each external entity reference with
the text contained in the external entity file.
From the standpoint of an XML application developer, this flag has no
effect on how a document is parsed. However, if it is given, it must
be accurate. Setting standalone="yes" when a
document does require DTD declarations that are not present in the
main document file is a violation of XML validity rules.
20.3.2. DTD (Document Type Definition)
Chapter 2 explained the difference
between well-formed and valid documents. Well-formed documents that
include and conform to a given DTD are considered valid. Documents
that include a DTD and violate the rules of that DTD are invalid. The
DTD is comprised of the DOCTYPE declaration and
both the internal subset (declarations contained
directly within the document) and the external
subset (declarations that are included from outside the
main document).
The parameter entity mechanism is a simple macro
replacement facility that is only valid within the context of the
DTD. Parameter entities are declared and then referenced from within
markup or possibly from within other entity declarations. The source
of the entity replacement text can be either a literal string or the
contents of an external file. Parameter entities simplify maintenance
of large, complex documents by allowing authors to build libraries of
commonly used entity declarations.
Parameter Entity Declarations |
|
<!ENTITY % name "Replacement text.">
<!ENTITY % name SYSTEM
"system-literal">
<!ENTITY % name PUBLIC "pubid-literal"
"system-literal">
| |
Parameter entities are declared within the
document's DTD and must be declared before they are
used. The declaration provides two key pieces of information:
-
The name of the entity, which is used when it is referenced
-
The replacement text, either directly or indirectly through a link to
an external entity
Be aware that an XML parser performs some preprocessing on the
replacement text before it is used in an entity reference. Most
importantly, parameter entity references in the replacement text are
recursively expanded before the final version of the replacement text
is stored. Character references are also replaced immediately with
the specified character. This replacement can lead to unexpected side
effects, particularly when constructing parameter entities that
declare other parameter entities. For full disclosure of how entity
replacement is implemented by an XML parser and what kinds of
unexpected side effects can occur, see Appendix D of the XML 1.0
specification. The specification is available on the World Wide Web
Consortium web site (http://www.w3.org/TR/REC-xml#sec-entexpand ).
General entities are declared within the document type definition and
then referenced within the document's text and
attribute content. When the document is parsed, the
entity's replacement text is substituted for the
entity reference. The parser then resumes parsing, starting with the
text that was just replaced.
General entities are declared within
the DTD using a superset of the syntax used to declare parameter
entities. Besides the ability to declare internal parsed entities and
external parsed entities, you can declare external unparsed entities
and associate an XML notation name with them.
Internal entities
are used when the replacement text can be efficiently stored inline
as a literal string. The replacement text within an internal entity
is included completely in the entity declaration itself, obviating
the need for an external file to contain the replacement text. This
situation closely resembles the string replacement macro facilities
found in many popular programming languages and environments:
<!ENTITY name "Replacement text">
There are two types of external entities: parsed
and unparsed. When a parsed entity is referenced, the contents of the
external entity are included in the document, and the XML parser
resumes parsing, starting with the newly included text. When an
unparsed entity is referenced, the parser supplies the application
with the unparsed entity's URI, but it does not
insert that data into the document or parse it. What to do with that
URI is up to the application. Any entity declared with an XML
notation name associated with it is an external unparsed entity, and
any references to it within the document must be made using attribute
values of type ENITITY or
ENTITIES:
<!ENTITY name SYSTEM
"system-literal">
<!ENTITY name PUBLIC
"pubid-literal" "system-literal">
<?xml[ version="1.0"] encoding="encoding-name"?>
| |
Files that contain external
parsed entities must include a text declaration if the entity file
uses a character encoding other than UTF-8 or UTF-16. This
declaration would be followed by the replacement text of the external
parsed entity.
NOTE:
External parsed entities may contain only document content or a
completely well-formed subset of the DTD. This restriction is
significant because it indicates that external parameter entities
cannot be used to play token-pasting games by splitting XML syntax
constructs into multiple files, then expecting the parser to
reassemble them.
It may be necessary
at times to include data in your XML
document that should not be parsed. For instance, your XML document
may need to include pointers to graphics files that will be used by
an application. These files are logically part of the document, but
should not be parsed. The XML language allows you to declare external
unparsed entities that can be included as attribute values within the
content of your document:
<!ENTITY name SYSTEM
"system-literal" NDATA notation_name >
<!ENTITY name PUBLIC "pubid-literal "
"system-literal" NDATA notation_name >
To include unparsed entities, you must first declare a notation that
will be referenced in the actual entity declaration:
<!NOTATION gif SYSTEM "images/gif">
Then declaring the entity itself is possible: <!ENTITY bookcase_pic SYSTEM "bookcase.gif" NDATA gif>
As an unparsed general entity, it can be referenced only as an
attribute value of type ENTITY or
ENTITIES:
<picture src="bookcase_pic" type="gif"/>
When an XML parser parses this element, the information contained in
the entity and notation declarations can be used to identify the
actual type of data stored in the external entity. For example, a
program could choose to display the contents of a GIF external entity
on the screen, once the actual format is known.
NOTE:
XLink and similar mechanisms are commonly used in place of unparsed
entities.
The document type declaration
can include part or all of the document type definition from an
external file. This external portion of the DTD is referred to as the
external DTD subset and may contain markup declarations, conditional
sections, and parameter entity references. It must include a text
declaration if the character encoding is not UTF-8 or UTF-16:
<?xml[ version="1.0"] encoding="encoding-name"?>
This declaration (if present) would then be followed by a series of
complete DTD markup statements, including ELEMENT,
ATTLIST, ENTITY, and
NOTATION declarations, as well as conditional
sections, and processing instructions. For example:
<!ELEMENT furniture_item (desc, %extra_tags; user_tags?, parts_list,
assembly+)>
<!ATTLIST furniture_item
xmlns CDATA #FIXED "http://namespaces.oreilly.com/furniture/"
>
...
The internal DTD subset is the
portion of the document type definition included directly within the
document type declaration between the [ and
] characters. The internal DTD subset can contain
markup declarations and parameter entity references, but not
conditional sections. A single document may have both internal and
external DTD subsets, which, when taken together, form the complete
document type definition. The following example shows the internal
subset, which appears between the [ and
] characters:
<!DOCTYPE furniture_item SYSTEM "furniture.dtd"
[
<!ENTITY % bookcase_ex SYSTEM "Bookcase_ex.ent">
%bookcase_ex;
<!ENTITY bookcase_pic SYSTEM "bookcase.gif" NDATA gif>
<!ENTITY parts_list SYSTEM "parts_list.ent">
]>
Element type declarations provide a template for
the actual element instances that appear within an XML document. The
declaration determines what type of content, if any, can be contained
within elements with the given name. The following sections describe
the various element content options available.
NOTE:
Since
namespaces are not explicitly
included in the XML 1.0 recommendation, element and attribute
declarations within a DTD must give the complete (qualified) name
that will be used in the target document. This means that if
namespace prefixes will be used in instance documents, the DTD must
declare them just as they will appear, prefixes and all. While
parameter entities may allow instance documents to use different
prefixes, this still makes complete and seamless integration of
namespaces into a DTD-based application very awkward.
Elements that are declared empty cannot contain content or nested
elements. Within the document, empty elements may use one of the
following two syntax forms:
<name [attribute="value" ...]/>
<name [attribute="value" ...]></name>
This content specifier acts as a wildcard, allowing elements of this
type to contain character data or instances of any valid element
types that are declared in the DTD.
Mixed Content Element Type | |
<!ELEMENT name (#PCDATA [ | name]+)*>
<!ELEMENT name (#PCDATA)> | |
Element declarations that include the #PCDATA
token can include text content mixed with other nested elements that
are declared in the optional portion of the element declaration. If
the #PCDATA token is used, it is not possible to
limit the number of times or sequence in which other nested elements
are mixed with the parsed character data. If only text content is
desired, the asterisk is optional.
<!ELEMENT name (child_node_regexp)[? | * | +]> | |
XML provides a simple regular-expression syntax that can be used to
limit the order and number of child elements within a parent element.
This language includes the following operators:
Operator
|
Meaning
|
Name
|
Matches an element of the given name
|
( ... )
|
Groups expressions for processing as sets of sequences (using the
comma as a separator) or choices (using | as a separator)
|
?
|
Indicates that the preceding name or expression can occur zero or one
times at this point in the document
|
*
|
Indicates that the preceding name or expression can occur zero or
more times at this point in the document
|
+
|
Indicates that the preceding name or expression must occur one or
more times at this point in the document
|
Attribute List Declaration | |
<!ATTLIST element_name [attribute_name attribute_type default_decl]*> | |
In a valid XML document it is necessary to declare the attribute
names, types, and default values that are used with each element
type.
The attribute name must obey the rules for XML identifiers, and no
duplicate attribute names may exist within a single declaration.
Attributes
are declared as having a specific type. Depending on the declared
type, a validating XML parser will constrain the values that appear
in instances of those attributes within a document. The following
table lists the various attribute types and their meanings:
Attribute type
|
Meaning
|
CDATA
|
Simple character data.
|
ID
|
A unique ID value within the current XML document.
No two ID attribute values within a document can
have the same value, and no element can have two attributes of type
ID.
|
IDREF,
IDREFS
|
A single reference to an element ID
(IDREF) or a list of IDs
(IDREFS), separated by spaces. Every
ID token must refer to a valid
ID located somewhere within the document that
appears as the ID type
attribute's value.
|
ENTITY, ENTITIES
|
A single reference to a declared unparsed external entity
(ENTITY) or a list of references
(ENTITIES), separated by spaces.
|
NMTOKEN, NMTOKENS
|
A single name token value (NMTOKEN) or a list of
name tokens (NMTOKENS), separated by spaces.
|
... NOTATION (notation [| notation]*) ... | |
The NOTATION attribute mechanism lets XML document
authors indicate that the character content of some elements obey the
rules of some formal language other than XML. The following short
sample document shows how notations might be used to specify the type
of programming language stored in the
code_fragment element:
<?xml version="1.0"?>
<!DOCTYPE code_fragment
[
<!NOTATION java_code PUBLIC "Java source code">
<!NOTATION c_code PUBLIC "C source code">
<!NOTATION perl_code PUBLIC "Perl source code">
<!ELEMENT code_fragment (#PCDATA)>
<!ATTLIST code_fragment
code_lang NOTATION (java_code | c_code | perl_code) #REQUIRED>
]>
<code_fragment code_lang="c_code">
main( ) { printf("Hello, world."); }
</code_fragment>
Enumeration Attribute Type | |
... (name_token [| name_token]*) ... | |
This syntax limits the possible values of the given attribute to one
of the name tokens from the provided list:
<!ELEMENT door EMPTY>
<!ATTLIST door
state (open | closed | missing) "open">
. . .
<door state="closed"/>
If an optional attribute is not present on a given element, a default
value may be provided to be passed by the XML parser to the client
application. The following table shows various forms of
the attribute default value clause and their meanings:
Default value clause
|
Explanation
|
#REQUIRED
|
A value must be provided for this attribute.
|
#IMPLIED
|
A value may or may not be provided for this attribute.
|
[#FIXED ]
"default
value"
|
If this attribute has no explicit value, the XML parser substitutes
the given default value. If the #FIXED token is
provided, this attribute's value must match the
given default value. In either case, the parent element always has an
attribute with this name.
|
The #FIXED modifier indicates that the attribute
may contain only the value given in the attribute declaration.
Although redundant, it is possible to provide an explicit attribute
value on an element when the attribute was declared as
#FIXED. The only restriction is that the attribute
value must exactly match the value given in the
#FIXED declaration.
Some
attributes are significant to XML and must be declared and
implemented in a particular way:
- xml:space
-
The xml:space attribute tells an XML application
whether the whitespace within the specified element is significant:
<!ATTLIST element_name xml:space (default|preserve)
default_decl>
<!ATTLIST element_name xml:space (default) #FIXED 'default' >
<!ATTLIST element_name xml:space (preserve) #FIXED 'preserve' >
- xml:lang
-
For an element's
character content, the xml:lang attribute allows a
document author to specify the human language for an
element's character content. If used in a valid XML
document, the document type definition must include an attribute type
declaration with the xml:lang attribute name. See
Chapter 5 for an explanation of language support
in XML.
<!NOTATION notation_name SYSTEM "system-literal">
<!NOTATION notation_name PUBLIC "pubid-literal">
<!NOTATION notation_name PUBLIC "pubid-literal" "system-literal"> | |
Notation declarations are used to provide information to an XML
application about the format of the document's
unparsed content. Notations are used by unparsed external entities,
processing instructions, and some attribute values.
Notation information is not significant to the XML parser, but it is
preserved for use by the client application. The public and system
identifiers are made available to the client application so that it
may correctly interpret non-XML data and processing instructions.
The conditional section markup provides
support for conditionally including and excluding content at parse
time within an XML document's external subset.
Conditional sections are not allowed within a
document's internal subset. The following example
illustrates a likely application of conditional sections:
<!ENTITY % debug 'IGNORE' >
<!ENTITY % release 'INCLUDE' >
<!ELEMENT addend (#PCDATA)>
<!ELEMENT result (#PCDATA)>
<![%debug;[
<!ELEMENT sum (addend+, result)>
]]>
<![%release;[
<!ELEMENT sum (result)>
]]>
20.3.3. Document Body
Elements are an XML
document's lifeblood. They provide the structure for
character data and attribute values that make up a particular
instance of an XML document type definition. The
!ELEMENT and !ATTLIST
declarations from the DTD restrict the possible contents of an
element within a valid XML document. Combining elements and/or
attributes that violate these restrictions generates an error in a
validating parser.
<element_name [attribute_name="attribute value"]*> ...</element_name>
| |
Elements that have content (either character data, other elements, or
both) must start with a start-tag and end with an element end-tag.
<element_name [attribute_name="attribute value"]*></empty_element>
<element_name [attribute_name="attribute value"]* />
| |
Empty
elements
have no content and are written using either the start- and end-tag
syntax mentioned previously or the empty-element syntax. The two
forms are functionally identical, but the empty-element syntax is
more succinct and more frequently used.
attribute_name="attribute value"
attribute_name='attribute value'
| |
Elements may include attributes. The order of attributes within an
element tag is not significant and is not guaranteed to be preserved
by an XML parser. Attribute values must appear within either single
or double quotations. Attribute values within a document must conform
to the rules explained in Section 20.4.1
of this chapter.
Note that whitespace may appear around the =
character.
The value that appears in the quoted string is tested for validity,
depending on the attribute type provided in the
!ATTLIST declaration for the element type.
Attribute values can contain general entity references, but cannot
contain references to external parsed entities. See Section 20.4.1
of this chapter for more information about attribute-value
restrictions.
20.3.4. Namespaces
Although namespace support was not
part of the original XML 1.0 recommendation, Namespaces in
XML was approved less than a year later
(January 14, 1999). Namespaces are used to identify uniquely the
element and attribute names of a given XML application from those of
other applications. See Chapter 4 for more
detailed information.
The following sections describe how namespaces impact the formation
and interpretation of element and attribute names within an XML
document.
An
unqualified
name is
an XML element or attribute name that is not associated with a
namespace. This could be because it has no namespace prefix and no
default namespace has been declared. All unprefixed attribute names
are unqualified because they are never automatically associated with
a default namespace. XML parsers that do not implement namespace
support (of which there are very few) or parsers that have been
configured to ignore namespaces will always return unqualified names
to their client applications. Two unqualified names are considered to
be the same if they are lexically identical.
A
qualified name is an element
or attribute name that is associated with an XML namespace. There are
three possible types of qualified names:
Unlike unqualified names, qualified names are considered the same
only if their namespace URIs (from their namespace declarations) and
their local parts match.
Default Namespace Declaration |
|
When this
attribute is included in an element
start-tag, it and any unprefixed elements contained within it are
automatically associated with the namespace URI given. If the
xmlns attribute is set to the empty string, any
effective default namespace is ignored, and unprefixed elements are
not associated with any namespace.
NOTE:
An important caveat about default namespace declarations is that they
do not affect unprefixed attributes. Unprefixed attributes are never
explicitly named in any namespace, even if their containing element
is.
Namespace Prefix Declaration |
|
xmlns:prefix="namespace_URI"
| |
This declaration associates the
namespace URI given with the prefix name given. Once it has been
declared, the prefix may qualify the current element name, attribute
names, or any other element or attribute name within the scope of the
element that declares it. Nested elements may redefine a given
prefix, using a different namespace URI if desired.
| | | 20.2. Annotated Sample Documents | | 20.4. Constraints |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|