Documenting Schemas (XML Schema)

The issue of documenting schemas--or any machine readable language--goes beyond simple additions of comments. The real challenge is to create schemas that are readable both directly by looking at their source code and by documentation extraction tools.

14.1. Style Matters

Writing schemas is much like writing programs. Two pieces of code may both work, but one is more readable and maintainable than the other. Readability is good.

14.1.1. Keep It Simple

Although W3C XML Schema has been carefully specified so that schema processors can find their way through the most complex and intricate combinations of its many features, the same can't be expected of the average human reader. I must confess that I, for one, am getting rapidly lost in the meanders of medium complexity schemas, such as the famous schema for schema.

"Keep it simple" is a useful principle. Although W3C XML Schema gives you a huge number of features, you don't need to use all of them in every schema. Each of them incurs a price in terms of readability of your schema.

Some of the rules for simplicity that we have used for some time with programming languages apply here, such as the conflicting rules for brevity ("If a function is more than one page long, split it"), and directness ("A function should be called more than once"). There are, of course, others such as "Put the code and the documentation in the same place," "If you can't say it in English, you can't say it in C/C++ (or Java, C#, Perl, Python, etc.)," and "Don't solve problems that don't exist."

Translated for the XML design world, these four could read as "If a declaration is more than one page long, split it," "A declaration should be referred to at least one time" (we will see next that there might be exceptions), "If you can't say it in English, you can't say it in XML," and, of course, "Don't solve problems that don't exist."

14.1.2. Think Globally

When I started working with W3C XML Schema, I used to think that the Russian doll design was the simplest, since it's so close to the structure of the instance documents. Having written the W3C XML Schema reference manual, I am convinced that flat structures in which all the elements are global are much simpler to document and just as simple to write!

The Russian doll design relies on an analogy between object-oriented programming and markup. This is somewhat misleading: there is no such thing as a private or local object in an XML document (except maybe if you encode or encrypt some fragments); when you open an XML document, its whole content is exposed, and everything is public and needs to be documented with the same level of accuracy. To describe a concept, give it a name. W3C XML Schema enforces the attribution of unique names only for global elements. Although different content models are often presented as an advantage over the DTDs, defining them under the same element name is very confusing when reading an instance document. The most convenient way to create a reference manual for an XML vocabulary is through a dictionary of elements. Reusing the same element name for different purposes creates multiple entries that are confusing and difficult to read (like the entries for common words, such as "place" in an English dictionary); the example of W3C XML Schema and its very different meanings for xs:extension is enlightening. Therefore, the second piece of advice is to define the elements as global when possible. Note that this advice doesn't apply to unqualified attributes, which cannot be defined as global.

14.1.3. When It's Similar, Show It

The third and last piece of advice contradicts the first one, and a trade-off needs to be found between these two. The first two bits of advice lead to what I call flat schemas. These are similar to our very first example in Chapter 1, "Schema Uses and Development", in which all the elements and attributes are global with local type definitions. This style is easy to read but doesn't highlight the similarities between elements such as the fact that authors and characters can be considered persons and share some properties. When strong similarities exist between different elements, using one of the techniques already discussed (either a complex type derivation or elements and attributes group composition) can enhance the readability of the schema.

The third bit of advice states you should use W3C XML Schema features to highlight the strong similarities when they are present.

Chapter 14. Documenting Schemas

Contents:

14.1. Style Matters

14.1.1. Keep It Simple

14.1.2. Think Globally

14.1.3. When It's Similar, Show It