Should You Use XHTML? (HTML & XHTML: The Definitive Guide, 4th Edition)

16.4. Should You Use XHTML?

For a document author used to HTML, XHTML is clearly a more painful and less forgiving document markup language. Whereas at one time we prided ourselves on being able to crank out HTML with pencil and paper, it's much more tedious to write XHTML without special document-preparation applications. Why should any author want to take on that extra baggage?

16.4.1. The Dusty Deck Problem

Over just a few years, the Web has been filled with billions of pages. It is a safe bet that many of these pages are not compliant with any defined version of HTML. It is an even safer bet that the vast majority of these pages are not XHTML-compliant.

The harsh reality is that these billions of pages will never be converted to XHTML. Who has the time to go back, root out these old pages, and tweak them to make them XHTML-compliant -- especially when the end result, as perceived by the user, will not change? Like the dusty decks of COBOL programs that lay unchanged for decades before the Y2K problem forced programmers to bring them up to snuff, these dusty decks of web pages will also lie untouched until a similarly dramatic event forces us to update them.

However, the dusty deck problem is no excuse for not writing compliant documents going forward. Leave those old documents alone, but don't create a new conversion problem every time you create a new document. A little effort now will help your documents work across a wider range of browsers in the future.

16.4.2. Automatic Conversion

If your sense of responsibility leads you to undertake the conversion of your existing HTML document into XHTML, you'll find a utility named Tidy to be exceptionally useful. Written by Dave Raggett, one of the movers and shakers at the W3C, it automates a significant amount of the work required to convert HTML documents into XHTML.

While Tidy's capabilities are too varied and wonderful to be fully listed here, we can at least assure you that the case conversion, quoted attributes, and proper element nesting are all detected and corrected by Tidy. For the complete list of features and the latest version of Tidy for any number of computing platforms, visit http://www.w3.org/People/Raggett/tidy/.

16.4.3. Lenient Browsers and Lazy Authors

There is a good rule of thumb regarding data sharing, especially on the Internet: be lenient in what you accept and strict in what you produce. This is a not a commentary on social policy, but rather a pragmatic admonition to tolerate ambiguity and errors in data you receive while making sure that anything you send is scrupulously correct.

Web browsers are good examples of lenient acceptors. Most current web pages have some sort of error in them, albeit often just an error of omission. Nonetheless, browsers accept the error and present a reasonable document to the user. This leniency lets authors get away with all sorts of things, often without even knowing they've made a mistake.

Most authors stop developing a page when it looks good and works the way they want it to. Very few take the time to run their pages through the various HTML- compliance tools to catch potential errors. Many of those who do try to test for compliance are so overwhelmed by the number of minor errors they have committed that they simply give up and continue to create bad pages that can be handled by good browsers.

Since the number of bad pages continues to grow, browsers cannot afford to start being strict. Any browser that tried to enforce even the most basic rules of the HTML standard would be abandoned by users who want to see web pages, not error messages. A vicious cycle ensues: bad pages force the use of lenient browsers, which encourage the creation of more bad pages. Break the cycle by vowing to create only XHTML-compliant content whenever you can.

16.4.4. Time, Money, and Standards

XHTML was developed as an XML representation of the HTML standard. It is intended, going forward, to become the single standard everyone should use to create content for the Web.

In a perfect world, standards are universally adopted and used. Full compliance is required of any document before it is placed on the Web. Conversion of legacy documents is done immediately.

In the real world, a shortage of time and money prevent the universal use of standards. Under pressure to quickly deliver something that works, developers turn out pages that work only well enough. Since browsers allow second-rate content to exist on the Web, the need to comply with a standard becomes a secondary issue, one that is too quickly ignored in the dizzying pace of web development.

16.4.5. Man Versus Machine

All is not lost, however. While XHTML is painful and tedious for humans to create, it is quite easy for machines to create. As the number of web authoring tools continues to increase, the pages created by these machines should be completely XHTML compliant. While it doesn't make much economic sense for a web author to spend a lot of time getting all those end tags in the right spot, it does make sense for the programmer developing an authoring tool to ensure that the tool generates all those correct end tags. The effort expended by the web author is leveraged exactly once for each page; the effort of the tool creator is leveraged over and over, each time the tool produces a new page.

It seems that the real future of XHTML lies in the realm of machine-generated content. XHTML is far too picky to be successfully used by the millions of casual web authors who create small sites. However, if those same authors use a tool to create their pages, they could be generating XHTML-compliant pages and never even know it.

If you are among that small community of developers who create tools that generate HTML output, you are doing a great disservice to your many potential customers if your tool does not generate excruciatingly correct XHTML-compliant output. There is no technical excuse for any tool not to generate XHTML-compliant output. If there are compatibility issues surrounding how the output might be used (with an non-XHTML browser, perhaps), then the tool should provide a switch that lets the author select XHTML-compliant output as an option.

16.4.6. What To Do?

We recommend that all HTML authors take the time to absorb the differences between HTML and XHTML as we've outlined in this chapter. Given the resources and opportunity, you should try to create XHTML-compliant pages wherever possible for the sites you are creating. Certainly you should choose authoring tools that support XHTML and give you the option of generating XHTML-compliant pages.

One day, XHTML may -- and should -- replace HTML as the official standard language of the Web. Even so, the number of noncompliant pages on the Web will be overwhelming, forcing browsers to honor old HTML constructs and features for at least the next five to ten years. For better or worse, HTML is here to stay as the de facto standard for web authors for years to come.