Referencing Schemas and Schema Datatypes in XML Documents (XML Schema)

So far, we have seen how W3C XML Schemas can be written outside XML documents without touching the actual instance documents. In this chapter, we will introduce a new namespace to be used inside XML documents to provide information for use by schema processors. This information may identify the location of the associated schemas, as well as further identify the schema types used, which opens a new level of flexibility and interaction between schemas and instance documents in the design of XML applications.

This namespace (which is the same for all W3C XML Schema meta-information located in the instance documents themselves) is http://www.w3.org/2001/XMLSchema-instance. The prefix usually used to designate this namespace is xsi. This namespace uses only four attributes, which are considered valid in any element of any instance document without being declared in the schema.

11.1. Associating Schemas with Instance Documents

The first piece of information that may be useful for a schema processor is some hints about the locations where the schema processor might find schemas relevant to the instance document. This feature is similar to the SYSTEM identifier of the XML doctype declaration, but with some important differences. The first difference is that a schema may not be enough to describe a document, since each schema might describe only one namespace (or lack of a namespace), and the composition of the schemas can be done in the instance document. The second difference is that the locations indicated in the instance documents are only considered hints and may be overridden by the user or by the schema processor.

The hard link between a XML document and its DTD has fed many debates in the XML community. Many developers remember what happened when Netscape restructured their web site. The address of the DTD for RSS 0.91 (http://my.netscape.com/publish/formats/rss-0.91.dtd) suddenly returned a 404 error, breaking hundreds of applications. Another motivation for "soft" links between instance documents and their schemas is they allow application of different schemas, depending on local business rules. For instance, a supplier receiving an order in a XML document may have specific rules to check the document with its own schema.

For all these reasons, the Recommendation states that if a schema processor finds such information in a document, it should try to retrieve the schemas at the locations indicated, but it could be directed otherwise by the invoking application or user. When such information is missing, a schema processor may also be directed by the invoking application to dereference specified locations. When no information is provided at all by the invoking application or in the instance document, the schema processor is free to try any method to find a schema. Among the methods mentioned in this case, a schema processor may try to load the resource that may be available at the namespace URI to see if a schema is published there, but it could use other techniques as well, such as RDDL or catalog systems.

W3C XML Schema defines two attributes to define a list of schema locations associated with target namespaces as well as the location of a schema without target namespace. The attribute to use when there is no target namespace is xsi:noNamespaceSchemaLocation , and its value is a URI pointing to the corresponding schema. Although this attribute can be used without a declaration in any element of any instance document, it must be found by the schema processor before it needs it to validate any element or attribute (i.e., at or before the last point in the first element without a namespace found in the document). Furthermore, its scope is global to the entire document and it cannot be redefined.

In practice, the xsi:noNamespaceSchemaLocation attribute will often be located in the document element. We can locate a schema named first.xsd in the same directory as the instance document in the example used in this chapter (which doesn't use any namespace) as:

<library xsi:noNamespaceSchemaLocation="first.xsd"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  .../...
</library>

To reference schemas with a target namespace, lists of URIs must be provided in xsi:schemaLocation attributes. This list is, in fact, a list of pairs of URIs. In each pair, the first URI identifies a target namespace and the second URI identifies the location of a schema with this target namespace. The same rule that applied to xsi:noNamespaceSchemaLocation applies to the location of this attribute: for each target namespace for which you want to provide a schema location, you need to provide this information before a schema processor needs it to do its job.

To illustrate the usage of the xsi:schemaLocation attribute, let's examine a simplified version of an example with the two namespaces that are described in Chapter 10, "Controlling Namespaces". The instance document is as follows (without any xsi:schemaLocation):

<?xml version="1.0"?> 
<book id="b0836217462" xmlns="http://dyomedea.com/ns/library"
  xmlns:mkt="http://dyomedea.com/ns/library/mkt">
  <title>
    Being a Dog Is a Full-Time Job
  </title>
  <author>
    Charles M Schulz
  </author>
  <mkt:cover>
    Paperback
  </mkt:cover>
  <mkt:pages>
    128
  </mkt:pages>
</book>

We have an open schema for the main namespaces that allows arbitrary elements from other namespaces, such as:

<?xml version="1.0"?> 
<xs:schema targetNamespace="http://dyomedea.com/ns/library"
  elementFormDefault="qualified" attributeFormDefault="unqualified"
  xmlns:lib="http://dyomedea.com/ns/library"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="book">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="title" type="xs:token"/> 
        <xs:element name="author" type="xs:token"
          maxOccurs="unbounded"/> 
        <xs:any namespace="##other" processContents="lax"
          minOccurs="0"maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="id" type="xs:ID"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

We also have a schema for the two elements that belong to the namespace for our marketing department:

<?xml version="1.0"?> 
<xs:schema targetNamespace="http://dyomedea.com/ns/library/mkt"
  elementFormDefault="qualified" attributeFormDefault="unqualified"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="cover" type="xs:NMTOKEN"/>
  <xs:element name="pages" type="xs:nonNegativeInteger"/>
</xs:schema>

This example was carefully chosen to have two schemas for two namespaces that are not linked together: there is no reference to the marketing namespace from the library and vice versa. We have several possibilities, depending on the hints given to the schema processor. If we validate the instance document without any xsi:schemaLocation attribute or any other information from the command line or application, the schema validator is left alone to try to locate a schema. Depending on the algorithm implemented in the processor, it may try to dereference the namespace URIs of the document element (i.e., to attempt to load a resource that may be available here). In our case, this is http://dyomedea.com/ns/library. If there is no schema there, then it can't say whether the document is valid or not. Alternatively, the schema processor can try to dereference a RDDL document at this location, hoping to find a reference to a schema in the RDDL document.

More typically, the author of the instance document may be kind enough to give the location of the schema for the library namespace -- for instance:

<?xml version="1.0"?> 
<book xsi:schemaLocation="http://dyomedea.com/ns/library library.xsd"
  id="b0836217462" xmlns="http://dyomedea.com/ns/library"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:mkt="http://dyomedea.com/ns/library/mkt">
  <title>
    Being a Dog Is a Full-Time Job
  </title>
  <author>
    Charles M Schulz
  </author>
  <mkt:cover>
    Paperback
  </mkt:cover>
  <mkt:pages>
    128
  </mkt:pages>
</book>

We don't have any choice above the location of xsi:schemaLocation because the information is needed to validate the document element. If we want to include it, we must locate it in the document element. This attribute contains a single pair of values separated by a space:

"http://dyomedea.com/ns/library library.xsd"

As mentioned, the first value identifies the target namespace while the second value identifies the schema location. With this information at hand, the processor can read the schema and start validating the instance document. However, when it finds the marketing namespace that matches the xs:any wildcard, with a processContents attribute asking to validate when possible, it may again try to find a schema for this namespace by dereferencing the namespace URI. If it can find a schema, it validates the elements from the marketing namespace; if not, it considers them valid, since the processContents attribute is set to "lax."

If we want to improve our chances of finding a schema for the marketing library, we can also define its location in a xsi:schemaLocation attribute. The place in the instance document that we can provide the information is in the first element that uses this namespace, such as:

<?xml version="1.0"?> 
<book id="b0836217462"
  xsi:schemaLocation="http://dyomedea.com/ns/library library.xsd"
  xmlns="http://dyomedea.com/ns/library"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:mkt="http://dyomedea.com/ns/library/mkt">
  <title>
    Being a Dog Is a Full-Time Job
  </title>
  <author>
    Charles M Schulz
  </author> 
  <mkt:cover xsi:schemaLocation="http://dyomedea.com/ns/library/mkt
    marketing.xsd">
    Paperback
  </mkt:cover>
  <mkt:pages>
    128
  </mkt:pages>
</book>

The schema processor now has all the hints it needs to retrieve the schemas for both namespaces, and it should fully validate the elements that belong to the marketing namespace. Alternatively, we can place all the schema location hints in the same xsi:schemaLocation attribute:

<?xml version="1.0"?> 
<book id="b0836217462"
  xsi:schemaLocation="http://dyomedea.com/ns/library library.xsd
  http://dyomedea.com/ns/library/mkt marketing.xsd"
  xmlns="http://dyomedea.com/ns/library"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:mkt="http://dyomedea.com/ns/library/mkt">
  <title>
    Being a Dog Is a Full-Time Job
  </title>
  <author>
    Charles M Schulz
  </author>
  <mkt:cover>
    Paperback
  </mkt:cover>
  <mkt:pages>
    128
  </mkt:pages>
</book>

TIP: In these examples, we used relative URIs to locate the schemas. This is a good solution only if you assume that the schemas will be moved with the instance documents, and in many cases, absolute URIs will be preferred. When this is the case, they can be mapped back into local resources by a mechanism such as XML Catalogs (http://www.oasis-open.org/committees/entity/spec-2001-08-06.html), an OASIS specification that is implemented by an increasing number of tools.

Chapter 11. Referencing Schemas and Schema Datatypes in XML Documents

Contents:

11.1. Associating Schemas with Instance Documents