type of element or complex datatype that cannot be used directly in
the instance documents. An abstract element must be substituted and
is usually the head of a substitution group. An abstract complex type
may be used to define content models, in which case the type will
have to be substituted in the instance documents using
xsi:type. There is no feature to define simple
types as abstract (even though the predefined type
xs:NOTATION could be considered abstract).
In a regular expression, an atom expresses a condition on a
substring. Atoms may be followed by a quantifier defining the
expected number of the atom's occurrences. The atom,
with its optional number of occurrences, constitutes a
"piece." An atom may be a
character, a wildcard, a special character, a character class, or a
A simple type that is not derived by list or union from another
Pieces of information attached to an element and defined in its start
tag. Considered child nodes by the XPath data model, and considered
property nodes by the DOM, attributes are
"information items" to the XML
Containers that allow you to define, reference, and redefine groups
The datatype that is used as the starting point to define a new
datatype by derivation by restriction or extension.
Inventor of HTML and HTTP, and Director of the W3C; he is considered
the father of the World Wide Web (see http://www.w3.org/People/all#timbl).
Elements and complex datatypes that cannot be substituted in the
instance documents. A blocked element or complex type is restricted
in the substitutions that may occur in the instance documents. There
is no feature to block simple types.
canonical lexical representation
When a value in the value space may have different lexical
representations in the lexical space, the W3C XML Schema
Recommendation provides (when possible) a canonical representation,
which is the most "normal" or
"classical" and may be used as a
reference. Although most of the types have canonical representations,
some such as xs:duration or
xs:QName, do not have one.
Importing a schema without a namespace into a schema with a target
namespace is known as "chameleon
design." This is because the imported schema takes
the target namespace of the schema in which it is imported like a
chameleon takes the color of the environment in which it is placed.
In a regular expression, a character class is an atom matching a set
of characters. Character classes may be classical Perl character
classes, Unicode character classes, or user-defined character
classical Perl character class
A set of character classes designated by a single letter, for which
upper- and lowercases of the same letter are complementary (for
instance, "\d" is all the decimal
digits, and "\D" is all the
characters that are not decimal digits).
An element has a complex content model when it has child element
nodes only (and no text node).
Something that can be defined and referenced in a schema. Elements,
attributes, simple and complex types, and element and attribute
groups are components.
Containers that allow the manipulation of a set of elements as a
whole and defines their relative order. Compositors include
xs:sequence, xs:choice, and
xs:all. Compositors may be included in other
compositors to form complex combinations (with some limitations).
Most can also be used as particles and have
minOccurs and maxOccurs
attributes, which allow definition of the number of repetitions
expected for the whole group of elements that they define. The child
elements of a compositor are
"particles." A restriction applies
to xs:all as a compositor: it can only include
Consistent Declaration rule
This states that an element referenced by one
"location" in a schema cannot be
associated with two different simple or complex types.
A description of the structure of children elements and text nodes
(independent of attributes). The content model is
"simple" when there is a text node
but no elements, "complex" when
there are element nodes but no text,
"mixed" when there are text and
element nodes, and "empty" when
there are neither text nor element nodes. These definitions are
commonly used by XML developers and slightly different from those of
W3C XML Schema, for which there are only simple and complex content
models. (Mixed models are considered special cases of complex
contents, and empty models are considered either simple or complex
contents with no child nodes.)
A term used by W3C XML Schema to qualify both the content and the
structure of an element or attribute. Datatypes can be either simple
(when they describe an attribute or an element without an embedded
element or attribute) or complex (when they describe elements with
embedded child elements or attributes). W3C XML Schema datatypes
should not be confused with XML 1.0 element types, which are called
element names by W3C XML Schema.
A value that is used when no value is provided in the instance
document. Default values apply to attributes that are either empty or
missing in the instance documents and that apply to empty elements.
The action of defining a datatype by using the definition of one or
several other datatypes. Simple datatypes may be defined by
derivation by restriction, list, or union, while complex datatypes
can be defined by derivation by restriction or extension.
derivation by extension
derivation by list
The action of using a simple datatype (called the list type) to
define a new simple datatype as a whitespace-separated list of values
of the list type. Derivation by list applies only to simple
derivation by restriction
For simple datatypes, a derivation by restriction is the action of
defining a simple datatype by adding new constraints (called facets)
on the lexical or value space of an existing datatype (called the
base type). For complex datatypes, a derivation by restriction is the
action of giving a new content model for the datatype that is a
restriction of the base type.
derivation by union
The action of using a set of simple datatypes (called the member
types) to define a new simple datatype whose lexical space is the
union of the lexical spaces of the member types.
A datatype that is defined by derivation from other datatypes. They
can be user-defined when defined in a schema, or predefined when
defined in the W3C XML Schema Recommendation.
Document Object Model. An object-oriented model of XML documents,
including the definition of the API allowing its manipulation. The
third version of DOM (DOM Level 3) will include an API named
"Abstract Schemas" to facilitate
schema-guided editions of XML documents (see http://www.w3.org/TR/DOM-Level-3-Core).
Document Schema Definition Language (DSDL) is a project undertaken by
the ISO (ISO/IEC JTC 1/SC 34/WG 1, to be precise) whose objective is
"to create a framework within which multiple
validation tasks of different types can be applied to an XML document
in order to achieve more complete validation results than just the
application of a single technology" (see http://dsdl.org). DSDL has classified W3C XML
Schema as "object-oriented schema
Document Type Definition. XML 1.0 DTDs are inherited from SGML, in
which rules were included that allow the customization of the markup
itself and played a very central role. Because of the syntactical
rules included in their DTDs, SGML applications need a DTD to be able
to read an SGML document. One of the simplifications of XML is to
state that a XML parser should be able to read a document without
needing a DTD. DTDs have therefore been simplified over their SGML
ancestors and remain the first incarnation of what is today called a
XML Schema language.
One of the basic type of nodes in the tree represented by a XML
document. An element is delimited by start and end tags. In the
corresponding tree, an element is a nonterminal node, which may have
subnodes of type element, character (text), and namespace and
attribute, as well as comment and processing instruction nodes.
Term used in the XML 1.0 Recommendation, which is equivalent to the
notion of element names in W3C XML Schema and should not be confused
with the simple or complex datatype of an element.
Containers that allow you to define, reference, and redefine groups
An element that has neither child element nor text nodes (with or
A constraint added to the lexical or value space of a simple datatype
during a derivation by restriction. The list of facets that can be
used depends on the simple datatype. Facets can be
"fixed" to disable their use during
Elements and datatypes that cannot be substituted or derived any
longer in the schema. A final element may not be chosen as the head
of a substitution group while a final complex or simple type cannot
be used as a base for further derivation.
Facets that are "fixed" during a
derivation by restriction cannot be used during further derivations
A value that must match the value found in the instance document.
Used as default values if no value is supplied.
All the components (elements, attributes, simple and complex types,
element and attribute groups) can be defined at the top level of the
schema, directly under the xs:schema document
element. Their definition is said to be
"global," and they can be
referenced elsewhere in the schema, as well as in any schema that has
imported or included this schema.
XML Information Set. A formal description of the information that may
be found in a well-formed XML document.
A XML document that is a candidate to be validated by a schema. Any
well-formed XML 1.0 document that conforms to the Namespaces in XML
1.0 Recommendation can be considered a valid or invalid instance
The simple datatype that is used as the starting point to define a
new simple datatype using a derivation by list.
The set of all representations (after parsing and whitespace
processing) allowed for a simple datatype.
Most of the components (elements, attributes, simple and complex
types) can be defined inside of other components where they are used.
Their definition is said to be
"local" and they cannot be
referenced in other parts of the schema.
The name of a component in its namespace, i.e., the part of the
qualified name that comes after the namespace prefix.
The simple datatypes used as the starting point to define a new
simple datatype using a derivation by union.
The content of an element that contains both child element and text
A unique identifier that can be associated with a set of XML elements
and attributes. This identifier is a URI, which is not required to
point to an actual resource but must
"belong" to the author of these
elements and attributes. Since this full URI can't
be included in the name of each element and attribute, a namespace
prefix is assigned to the namespace URI through a namespace
declaration. This prefix is added to the local name of the elements
and attributes to form a qualified name. Namespaces are optional and
elements and attributes may have no namespaces attached. W3C XML
Schema has extended the scope of namespaces by using them not only
for elements and attributes but also for all the components of a
schema. A schema identifies the namespace of the components described
in a schema as a target namespace. When these components do not have
a namespace, the schema is said to have no target namespace.
The set of values that are sent by the parser to the applications. It
is at the interface between the parser and the schema validator.
Values from the parsed space undergo whitespace processing, as
defined by their simple datatype, to feed the lexical space. The
parsed space is, therefore, not visible by the facets.
An element, such as a compositor, a group of elements
(xs:group), an element definition or reference
(xs:element), or an element wildcard
(xs:any), which is included in a compositor to
define a list of elements. A restriction applies to
xs:all, which cannot be used as a particle even
though it is defined as a compositor. The number of occurrences of
particles may be constrained using their minOccurs
and maxOccurs attributes.
A facet that allows definition of a regular expression, which will be
applied to the lexical space to check its validity. By extension, the
regular expression defined in a pattern is often called
"pattern" as well.
Regular expressions (or patterns) are composed of pieces. Each piece
is itself composed of an atom describing a condition on a substring
and an optional quantifier defining the expected number of
occurrences of the atom.
The simple datatypes (both primitive and derived) that are defined in
the W3C XML Schema Recommendation.
A simple datatype that cannot be defined by derivation from other
datatypes. There is no way to create primitive datatypes, so all the
primitive datatypes are therefore predefined.
The Post Schema Validation Infoset. The Infoset after the information
gathered during a schema validation is added.
qualified element or attribute
Elements and attributes that belong to a namespace; i.e., a namespace
URI is defined for them. The name of qualified elements may have no
prefix if a default namespace is defined, but the name of qualified
attributes must be prefixed.
The complete name of a component, including the prefix associated to
its target namespace if one is defined.
Relational DataBase Management System. Developed in the late 70s,
this system has taken most of the database market and hosts a
significant amount of the data of many organizations. XML Schema
languages may help to insure the interface between that information
and XML documents.
Specifications published by the W3C. They cannot be officially called
"standards," since the W3C is a
consortium that does not have the status of the standard body
reserved for the ISO and national standard bodies. The
specifications, which are finalized and approved by the Director, are
then called "W3C Recommendations."
All of the components (elements, attributes, simple and complex
types, element and attribute groups) that have been created with a
global definition can be referenced when needed in the schema in
which they are defined, and in any schema that has imported or
included this schema. Their definition is used at the location where
they are referenced.
A syntax to express conditions on strings. The syntax used by the W3C
XML Schema for its patterns is very close to the syntax introduced by
the Perl programming language. A regular expression is composed of
A grammar-based XML Schema language developed by Murata Makoto and
published in March 2000 as a Japanese ISO Standard (see http://www.xml.gr.jp/relax).
A grammar-based XML Schema language resulting from a merger between
RELAX and TREX (see http://relaxng.org).
Simple API for XML. A streaming event-based API used between parsers
and applications. Its streaming nature means that pipelines of XML
processing may be created using SAX (see http://www.saxproject.org).
A rule-based XML Schema language, developed by Rick Jelliffe, using
XPath expressions to describe validation rules (see http://www.ascc.net/xml/resource/schematron/schematron.html).
The set of values as they are stored in a document. These values are
transformed by the parser, as defined in the Recommendation XML 1.0,
before reaching the application. The serialization space is not
visible to the schema processors.
Standard Generalized Markup Language. Created in 1980, the ancestor
of XML. XML was designed as a simplified subset of SGML to be used on
An element has a simple content model when it has a child text node
only (and no subelements). A simple content element has a simple type
if it has no attributes, and it has a complex type if it has any
A datatype that accepts only a text value. Simple datatypes can be
directly assigned to attributes and simple content elements that do
not accept any attribute. Simple datatypes can be used to define
complex datatypes by extension.
The major XML protocol used by Web Services; relies on W3C XML Schema
to describe the messages exchanged (see http://www.w3.org/TR/SOAP).
W3C XML Schema uses the term
"space" to mean a set of values
(lexical versus value spaces). For completeness, we introduced two
additional spaces in this book (the serialization and parsed spaces).
A character that may be used as an atom after a
"\" to accept a specific character,
either for convenience or because this character is interpreted
differently in the context of a regular expression.
A feature of W3C XML Schema, allowing you to define groups of
elements that may be used interchangeably in instance documents. They
are not declared as element groups, but through the
substitutionGroup attribute of
xs:element global definitions.
The namespace of the components described in a schema. When these
components do not have a namespace, the schema is said to have no
A grammar-based XML Schema language developed by James Clark (see
A set of characters classified by their
"localization" (Latin, Arabic,
Hebrew, Tibetan, and even Gothic or musical symbols).
A set of characters classified by their usage (letters, uppercase,
digit, punctuation, etc.).
Unicode character class
A set of character classes defined based on the Unicode blocks and
unqualified element or attribute
Elements and attributes that don't belong to a
namespace; i.e., no namespace URI is defined for them. Any unprefixed
attribute is unqualified, but unprefixed elements are unqualified
only if no default namespace is defined.
The UPA (Unique Particle Attribution) rule states that at any given
moment, a W3C XML Schema processor must know—without ambiguity
and without needing any forward reference in the document—which
particle in the schema describes an element in the instance document.
This rule is roughly equivalent to the restrictions known as
"non-deterministic content models"
for the XML 1.0 DTDs and as "ambiguous content
models" by SGML. The UPA rule is often associated
with the "Consistent Declaration
Uniform Resource Identifier. Defined by the RFCs 2396 and 2732. URIs
were created to extend the notion of URLs (Uniform Resource Locators)
to include abstract identifiers that do not necessarily need to
"locate" a resource.
Uniform Resource Locator, a common identifier used on the Web. URLs
are absolute when the full path to the resource is indicated, and
relative when a partial path is given that needs to be evaluated in
relation with a base URL.
user-defined character class
Datatypes that are defined in a schema. All the datatypes can be
defined by derivation or, for the complex datatypes only, by
A XML document that is well-formed and conforms to a schema (DTD, W3C
XML Schema, etc.) of some kind.
The set of all the possible values for a simple datatype, independent
of their actual representation in the instance documents.
World Wide Web Consortium. Originally created to settle HTML and HTTP
as de facto standards. The main specification body for the core
specifications of the World Wide Web and the keeper of the core XML
specifications (see http://www.w3.org).
An approach to using the Web for applications, as opposed to the Web
for human consumption that we use on a daily basis. Those services
rely on the same infrastructure as the Web, and exchange XML
documents over HTTP though a layer of protocols (such as SOAP or
XML-RPC), which are themselves based on XML. XML Schema languages are
used by these services to describe and control the XML documents that
An XML document that meets the conditions defined in the XML 1.0
Recommendation: it must be readable without ambiguity. Syntax errors
will be detected by a XML parser without schema of any type.
Characters #x9 (tab), #xA
(linefeed), #xD (carriage return), and
#x20 (space). These are often used to indent the
XML documents to give them a more readable aspect, and are filtered
by an operation named "whitespace
The action of applying the whitespace replacement, trimming the
leading and trailing spaces, and replacing all the sequences of
contiguous whitespaces by a single space between the parsed and
lexical spaces. Most of the simple datatypes apply whitespace
The action of preserving all the whitespaces from the parsed to the
lexical space. The xs:string datatypes and the
user-defined simple types derived from xs:string
(which do not change the value of the
xs:whitespace facet) are the only datatypes
applying whitespace preservation.
The operation of filtering that is done on the whitespaces present in
the value of a simple datatype. The whitespace processing is done
during the transformation between parsed and lexical spaces. W3C XML
Schema defines three whitespace processing approaches (depending on
the simple type): whitespace preservation, whitespace replacement,
and whitespace collapsing.
The action of replacing all the occurrences of the characters
#x9 (tab), #xA (linefeed), and
#xD (carriage return) by a #x20
(space) between the parsed and the lexical space. Whitespace
replacement doesn't change the length of the string.
xs:normalizedString and the user-defined simple
types derived from xs:string and
xs:normalizedString (for which the value of the
xs:whitespace facet is
"replace") are the only datatypes
that apply whitespace replacement.
A character used as an atom in a regular expression to accept a set
of characters. W3C XML Schema supports only one such wildcard: the
character ".", which means
"any character." This expression is
also used to designate the xs:any and
The XML parser developed by the XML Apache project (see http://xml.apache.org/xerces2-j/index.html).
A W3C specification defining a general purpose inclusion mechanism
for XML documents (see http://www.w3.org/TR/xinclude).
XML Linking Language is a W3C Recommendation (http://www.w3.org/TR/xlink)
"which allows elements to be inserted into XML
documents in order to create and describe links between
Extensible Markup Language. A subset of SGML created to be used on
the Web. Its core specification (XML 1.0) was published by the W3C in
February 1998. New specifications have been added since this date,
and the W3C considers that, with the addition of W3C XML Schema, the
core specifications are now complete.
Considered the ancestor of SOAP, XML-RPC is a simple XML protocol
that may be used to implement Web Services. It does not rely on the
W3C XML Schema to describe the content of its messages but has
defined a simpler binding mechanism (see http://www.xmlrpc.com).
A query language used to identify a set of nodes within a XML
document. Originally defined to be used with XSLT, it is also used by
XPointer and a simple subset is used in the
xs:key, xs:keyref, and
xs:unique W3C XML Schema elements. The XQuery
specification will be a superset of the second version of XPath. This
version will use type information provided by W3C XML Schema (see
XML Query language. This will be a superset of XPath 2.0 that will
use type information provided by the W3C XML Schema to optimize its
queries, and for features such as sort orders (see http://www.w3.org/TR/xquery).
Extensible Stylesheet Language Transformations. A programming
language specialized for the transformation of XML documents (see
An open source W3C XML Schema implementation available at http://www.w3.org/2001/03/webdata/xsv.
Copyright © 2002 O'Reilly & Associates. All rights reserved.