Beyond that, large documents are generally broken up into sections of
some kind, perhaps chapters for a book, parts for an article, or
claims for a legal brief. Most of the document consists of these
primary sections. In some cases, there'll be several
different kinds of sections; for instance, one for the table of
contents, one for the index, and one for the chapters of a book.
The paragraphs and other block-level items will mostly contain words
in a row, that is, text. Some of this text may be marked up with
inline elements. For instance, you may wish to indicate that a
particular string of text inside the block-level element is a date, a
person, or simply important. However, most of the text will not be so
annotated.
One area in which different XML applications diverge is the question
of whether block-level items may contain other block-level items. For
instance, can a paragraph contain a list? Or can a list item contain
a paragraph? It's probably easier to work with more
structured documents in which blocks can't contain
other blocks (particularly other instances of the same kind).
However, it's very often the case that a block has a
very good reason to contain other blocks. For instance, a long list
item or quotation may contain several paragraphs.
For the most part, this entire structure from the root down to the
most deeply nested inline item tends to be quite linear; that is, you
expect that a person will read the words in pretty much the same
order they appear in the document. If all the markup were suddenly
removed and you were left with nothing but the raw text, the result
should be more or less legible. The markup can be used to index or
format the document, but it's not a fundamental part
of the content.
This explains, in detail, why DTDs don't provide
strong (or really any) data typing. The documents for which SGML was
designed didn't need it. XML documents are doing
jobs for which SGML wasn't designed, such as
tracking inventories or census data, do need data typing;
that's why various people and organizations have
invented a plethora of schema languages. However, schemas really
don't improve on DTDs for narrative documents.