home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeHTML & XHTML: The Definitive GuideSearch this book

16.3. HTML Versus XHTML

The majority of HTML is completely compatible with XHTML, and this book is devoted to that majority. In this chapter, however, we talk about the minority: where the HTML 4.01 standard and the XHTML DTD differ. If you truly desire to create documents that are both HTML- and XHTML-compliant, you must heed the various warnings and caveats we outline in the following sections.

The biggest difference -- that's Difference with a capital D and that spells difficult -- is that writing XHTML documents requires much more discipline and attention to detail than even the most fastidious HTML author ever dreamed necessary. In W3C parlance, that means your documents must be impeccably "well-formed." Throughout the history of HTML -- and in this book -- authors have been encouraged to create well-formed documents, but you would have to break rank with the HTML standards for your documents to be considered well-formed by XML standards.

Nonetheless, your efforts to master XHTML will be rewarded with documents that are well-formed and a sense of satisfaction from playing by the new rules. You will truly benefit in the future, too: through XML, your documents will be able to appear in places you never dreamed would exist (mostly good places, we hope).

16.3.1. Correctly Nested Elements

One requirement of a well-formed XHTML document is that its elements are nested correctly. This isn't any different from HTML standards: simply close the markup elements in the order in which you opened them. If one element is within another, the end tag of the inner element must appear before the end tag of the outer element.

Hence, in the following well-formed XHTML segment, we end the italics tag before we end the bold one, because we'd started italicizing after we had started bolding the content:

<b>Close the italics tag <i>first</i></b>.

On the other hand, the following:

<b>Well-formed, this is <i>not!</b></i>

is not well-formed.

XHTML strictly enforces other nesting restrictions that have always been part of HTML but not always enforced. These restrictions are not formally part of the XHTML DTD; they are instead defined as part of the XHTML standard that is based upon the DTD.[85]

[85]This is hair-splitting within the XHTML standard. The XML standard has no mechanism to define which tags may not be placed within another tag. SGML, upon which XML is based, does have such a feature, but it was removed from XML to make the language easier to use and implement. As a result, these restrictions are simply listed in an appendix of the XHTML standard instead of explicitly defined in the XHTML DTD.

Nesting restrictions include:

  • The <a> tag cannot contain another <a> tag.

  • The <pre> tag cannot contain <img>, <object>, <big>, <small>, <sub>, or <sup>.

  • The <button> tag cannot contain <input>, <select>, <textarea>, <label>, <button>, <form>, <fieldset>, <iframe>, or <isindex>.

  • The <label> tag cannot contain other <label> tags.

  • The <form> tag cannot contain other <form> tags.

These restrictions apply to nesting at any level. For example, an <a> tag cannot contain any other <a> tags, or any tag that in turn contains an <a> tag.

16.3.7. Handling Special Characters

XHTML is more sensitive than HTML is to the use of the < and & characters in JavaScript and CSS declarations within your documents. In HTML, you can avoid potential conflicts by enclosing your scripts and stylesheets in comments (<!-- and -->). XML browsers, however, may simply remove all the contents of comments from your document, thereby deleting your hidden scripts and stylesheets.

To properly shield your special characters from XML browsers, enclose your styles or scripts in a CDATA section. This tells the XML browser that any characters contained within are plain old characters, without special meanings. For example:

<script language="JavaScript">
<![CDATA[
   JavaScript here...
 ]]>
</script>

This doesn't solve the problem, though. HTML browsers ignore the contents of the CDATA XML tag, but honor the contents of comment-enclosed scripts and stylesheets, whereas XML browsers do just the opposite. We recommend that you put your scripts and styles in external files and reference them in your document with appropriate external links.

Special characters in attribute values are problematic in XHTML, as well. In particular, an ampersand within an attribute value should always be written using &amp; and not simply an & character. Similarly, play it safe and encode less-than and greater-than signs using their &lt; and &gt; entities. For example, while:

<img src=seasonings.gif alt="Salt & pepper">

is perfectly valid HTML, it must be written as:

<img src="seasonings.gif" alt="Salt &amp; pepper" />

to be compliant XHTML.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.