Complex Content Models (XML Schema)

Restricting or extending simple content models is useful, but XML is not very useful without more complex models.

7.4.2. Derivation of Complex Content

Complex contents can also be derived, by extension or by restriction, from complex types. Before we see the details of these mechanisms, note that they are not symmetrical and their semantic is very different. The derivation of a complex content by restriction is a restriction of the set of matching instances. All the instance structures that match the restricted complex type must also match the base complex type. The derivation of a complex content by extension of a complex type is an extension of the content model by addition of new particles. A content that matches the base type does not necessarily match the extended complex type. This also means that there is no "roundtrip": in the general case, neither a restricted complex type nor an extended type can be extended or restricted back into its base type.

7.4.2.1. Derivation by extension

Derivation by extension is similar to the extension of simple content complex types. It is functionally very similar to joining groups of elements and attributes to create a new complex type. The idea behind this feature is to let people add new elements and attributes after those already defined in the base type. This is virtually equivalent to creating a sequence with the current content model followed by the new content model. Let's go back to our library to illustrate this. The content models of our elements author and character are relatively similar: author expects name, born, and dead, while character expects name, born, and qualification. If we want to use a derivation by extension, we can first create a base type that contains the first elements common to the content model of both elements:

<xs:complexType name="basePerson">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>

It is then possible to use derivations by extension to append new elements (dead for author and qualification for character) after those that have already been defined in the base type:

<xs:element name="author">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="dead" minOccurs="0"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

Technically, the meaning of this derivation is equivalent to creating a sequence containing the compositor used to define the base type as well as the base type included in the xs:extension element. Thus, the content models of these elements are similar to the content models defined as:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:sequence>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
      </xs:sequence>
      <xs:sequence>
        <xs:element ref="dead" minOccurs="0"/>
      </xs:sequence>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:sequence>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
      </xs:sequence>
      <xs:sequence>
        <xs:element ref="qualification"/>
      </xs:sequence>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

This equivalence clearly shows the feature of this derivation mechanism. As stated in the introduction of complex content derivation mechanisms, this is not an extension of the set of valid instance structures. An element character, with its mandatory qualification, cannot have a valid basePerson content model but rather the merge of two content models. This merge itself is subject to limitations: you cannot choose the point where the new content model is inserted; this addition is always done by appending the new compositor after the one of the base type. In our example, if the common elements name and born were not the first two elements, we couldn't have used a derivation by extension.

Another caveat in derivations by extension is we can't choose the compositor that is used to merge the two content models. This means that when we derive content models using xs:choice as compositors, it is not the scope of the choices that is extended, but rather the choices that are included in a xs:sequence. We could, for instance, extend the content model of the element persons, which we just created and which could be defined as a global complex type:

<xs:complexType name="basePersons">
  <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element ref="author"/>
    <xs:element ref="character"/>
  </xs:choice>
</xs:complexType>

If we add a new element using a derivation by extension:

<xs:complexType name="persons">
  <xs:complexContent>
    <xs:extension base="basePersons">
      <xs:sequence> 
        <xs:element name="editor" type="xs:token" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

The result is a content type that is equivalent to:

<xs:complexType name="personsEquivalent">
  <xs:sequence>
    <xs:choice minOccurs="0" maxOccurs="unbounded">
      <xs:element ref="author"/>
      <xs:element ref="character"/>
    </xs:choice>
    <xs:sequence> 
      <xs:element name="editor" type="xs:token" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:sequence>
</xs:complexType>

There is no way to obtain an extension of the xs:choice such as:

<xs:complexType name="personsAsWeWouldHaveLiked">
  <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element ref="author"/>
    <xs:element ref="character"/>
    <xs:element name="editor" type="xs:token"/>
  </xs:choice>
</xs:complexType>

The situation with xs:all is even worse: the restrictions on the composition of xs:all still apply. This means you can't add any content to a complex type defined with a xs:all--although you can still add new attributes--and also you can only use a xs:all compositor in a derivation by extension if the base type has an empty content model.

7.4.2.2. Derivation by restriction

Whereas derivation by extension is similar to merging two content models through a xs:sequence compositor, derivation by restriction is a restriction of the number of instance structures matching the complex type. In this respect, it is similar to the derivation by restriction of simple datatypes or simple content complex types (even though we've seen that a facet such as xs:whiteSpace expanded the number of instance documents matching a simple type). Note that this is the only similarity between derivations by restriction of simple and complex datatypes. This is highly confusing, since W3C XML Schema uses the same word and even the same element name in both cases, but these words have a different meaning and the content models of the xs:restriction elements are different.

Unlike simple type derivation, there are no facets to apply to complex types, and the derivation is done by defining the full content model of the derived datatype, which must be a logical restriction of the base type. Any instance structure valid per the derived datatype must also be valid per the base datatype. The W3C XML Schema specification does not define the derivation by restriction in these terms, but defines a formal algorithm to be followed by schema processors, which is roughly equivalent.

The derivation by restriction of a complex type is a declaration of intention that the derived type is a subset of the base type. (Rather than a derivation we've seen for simple types, this declaration is needed for features allowing substitutions and redefinitions of types, which we will see in Chapter 8, "Creating Building Blocks" and Chapter 12, "Creating More Building Blocks Using Object-Oriented Features" and which may provide useful information used by some applications.) When we derive simple types, we can take a base type without having to care about the details of the facets that are already applied, and just add our own set of facets. Here, on the contrary, we need to provide a full definition of a content model, except for attributes that can be declared as "prohibited" to be excluded from the restriction, something we have seen for the restriction of complex types with simple contents.

Moving on, let's try to find a base from which we can derive both the author and character elements by restriction. This time, we can be sure that such a complex type exists since all the complex types can be derived from an abstract xs:anyType, allowing any elements and attributes. In practice, however, we will try to find the most restrictive base type that can accommodate our needs. Since the name and born elements are present in both author and character, with the same number of occurrences, we can keep them as they appear. We then have two elements (dead and qualification, which appear only in one of the two elements author and character). Since both author and character will need to be valid per the base type, we will take both of them in the base type but make them optional by giving them a minOccurs attribute equal to 0. Our base type can then be:

<xs:complexType name="person">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
    <xs:element ref="dead" minOccurs="0"/>
    <xs:element ref="qualification" minOccurs="0"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>

The derivations are then done by defining the content model within a xs:restriction element (note that we have not repeated the attribute declarations which are not modified):

<xs:element name="author">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="dead" minOccurs="0"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

We see here that the syntax of a derivation by restriction is more verbose than the syntax of the straight definition of the content model. The purpose of this derivation is not to build modular schemas, but rather to give applications that use this schema the indication that there is some commonality between the content models, and if they know how to handle the complex type "person," they can handle the elements author and character. We will see W3C XML Schema features that rely on this derivation method in Chapter 8, "Creating Building Blocks" and Chapter 12, "Creating More Building Blocks Using Object-Oriented Features".

Changing the number of occurrences of particles is not the only modification that can be done during a derivation by restriction. Other operations that result in a reduction of the number of valid instance structures are also possible, such as changing a simple type to a more restrictive one or fixing values. The main constraint in this mechanism is that each particle of the derived type must be an explicit derivation of the corresponding particle of the base type. The effect of this statement is to limit the "depth" of the restrictions that can be performed in a single step, and when we need to restrict particles at a deeper level of imbrication, we may have to transform local definitions into global ones. We will see a concrete example in Section 7.5.1, "Creating Mixed Content Models", which are similar in this respect.

7.4.2.3. Asymmetry of these two methods

We now have all the elements we need to look back at the claim about the asymmetry of these derivation methods. This lack of symmetry is not a defect as such, but studying it is a good exercise to understanding the meaning of these two derivation methods. Let's examine the derivation by extension of basePerson into the character element:

<xs:complexType name="basePerson">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

The content model of character contains a mandatory qualification element. Valid characters are not valid per basePerson; thus, there is no hope to be able to derive character back into basePerson by restriction, since all the instance structures that are valid per the derived type must be valid per the base type in a derivation by restriction.

Let's look back at the derivation by restriction of the person base type into a character element:

<xs:complexType name="person">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
    <xs:element ref="dead" minOccurs="0"/>
    <xs:element ref="qualification" minOccurs="0"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

Again, it is not possible to derive the complex type of character into person, since it means changing the number of minimum occurrences of qualification from 1 to 0 and adding an optional dead element between born and qualification. None of these operations are possible during a derivation by extension, which can only append new content after the content of the base type, and can't update an existing particle (to change the number of occurrences) nor insert a new particle between two existing particles.

7.4. Complex Content Models

7.4.1. Creation of Complex Content

7.4.1.1. Compositors and particles

7.4.1.2. Element and attribute groups

7.4.1.3. Unique Particle Attribution Rule

7.4.1.4. Consistent Declaration Rule

7.4.1.5. Limitations on unordered content models

7.4.1.5.1. Limitations of `xs:all`

7.4.1.5.2. Adapting the structure of your document

7.4.1.5.3. Using `xs:choice` instead of `xs:all`

7.4.2. Derivation of Complex Content

7.4.2.1. Derivation by extension

7.4.2.2. Derivation by restriction

7.4.2.3. Asymmetry of these two methods

7.4. Complex Content Models

7.4.1. Creation of Complex Content

7.4.1.1. Compositors and particles

7.4.1.2. Element and attribute groups

7.4.1.3. Unique Particle Attribution Rule

7.4.1.4. Consistent Declaration Rule

7.4.1.5. Limitations on unordered content models

7.4.1.5.1. Limitations of xs:all

7.4.1.5.2. Adapting the structure of your document

7.4.1.5.3. Using xs:choice instead of xs:all

7.4.2. Derivation of Complex Content

7.4.2.1. Derivation by extension

7.4.2.2. Derivation by restriction

7.4.2.3. Asymmetry of these two methods

7.4.1.5.1. Limitations of `xs:all`

7.4.1.5.3. Using `xs:choice` instead of `xs:all`