Troff mixes formatting instructions with data. The instructions are
symbols composed of characters, with a special syntax so a troff
interpreter can tell the two apart. For example, the symbol
\fI changes the current font style to italic.
Without the backslash character, it would be treated as data. This
mixture of instructions and data is called
markup.
Troff can be even more detailed than that. The instruction
.vs 18p tells the formatter to
insert 18 points of vertical space at whatever point in the document
where the instruction appears. Beyond aesthetics, we
can't tell just by looking at it what purpose this
spacing serves; it gives a very specific instruction to the processor
that can't be interpreted in any other way. This
instruction is fine if you only want to prepare a document for
printing in a specific style. If you want to make changes, though, it
can be quite painful.
Suppose you've marked up a book in troff so that
every newly defined term is in boldface. Your document has thousands
of bold font instructions in it. You're happy and
ready to send it to the printer when suddenly, you get a call from
the design department. They tell you that the design has changed and
they now want the new terms to be formatted as italic. Now you have a
problem. You have to turn every bold instruction for a new term into
an italic instruction.
Your first thought is to open the document in your editor and do a
search-and-replace maneuver. But, to your horror, you realize that
new terms aren't the only places where you used bold
font instructions. You also used them for emphasis and for proper
nouns, meaning that a global replace would also mangle these
instances, which you definitely don't want. You can
change the right instructions only by going through them one at a
time, which could take hours, if not days.
Generic coding was a breakthrough for digital content. Finally,
content could be described for what it was, instead of how to display
it. Something like this looks more like a database than a
word-processing file:
<personnel-record>
<name>
<first>Rita</first>
<last>Book</last>
</name>
<birthday>
<year>1969</year>
<month>4</month>
<day>23</day>
</birthday>
</personnel-record>
Notice the lack of presentational information. You can format the
name any way you want: first name then last name, or last name first,
with a comma. You could format the date in American style (4/23/1969)
or European (23/4/1969) simply by specifying whether the
<month> or <day>
element should present its contents first. The document
doesn't dictate its use, which makes it useful as a
source document for multiple destinations.
In spite of its revolutionary capabilities, SGML never really caught
on with small companies the way it did with the big ones. Software is
expensive and bulky. It takes a team of developers to set up and
configure a production environment around SGML. SGML feels
bureaucratic, confusing, and resource-heavy. Thus, SGML in its
original form was not ready to take the world by storm.
Thus, the standards folk decided to try again and see if they
couldn't arrive at a compromise between the
descriptive power of SGML and the simplicity of HTML. They came up
with the Extensible Markup Language (XML). The
"X" stands for
"extensible," pointing out the
first obvious difference from HTML, which is that some people think
that "X" is a cooler-sounding
letter than "E" when used in an
acronym. The second and more relevant difference is that your
documents don't have to be stuck in the anemic tag
set of HTML. You can extend the tag namespace to be as descriptive as
you want -- as descriptive, even, as SGML. Voilà! The
bridge is built.
Every developer should have working knowledge of XML, since
it's the universal packing material for data, and so
many programs are all about crunching data. The rest of this chapter
gives a quick introduction to XML for developers.