home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeWeb Design in a NutshellSearch this book

31.4. Well-Formed XHTML

Web browsers are forgiving of sloppy HTML, but XHTML (being an XML application) requires fastidious attention to every detail. These requirements were outlined briefly in the XML chapter (Chapter 30, "Introduction to XML"), but we'll go over them in this section as they relate specifically to XHTML.

31.4.3. End Tags

In HTML, it is okay to omit the end tags for many block elements (such as <p> and <li>) because the browser is smart enough to close a block element when the next one begins. Not so in XHTML. In order to be well-formed, every container element must have its end tag, or it registers as an error and renders the document noncompliant.

31.4.6. Nesting Requirements

It has always been a rule in HTML that tags should be properly nested within one another. The closing tag of a contained element should always appear before the closing tag of the element that contains it. In XHTML, this rule is strictly enforced. So be sure that your elements are nested correctly, like this:

<b>I can <i>fly!</i></b>

and not overlapping like this:

<b>I can <i>fly!</b></i>

In addition, XHTML enforces other nesting restrictions that have always been a part of the HTML specification. While XML provides no specific way to indicate which elements may not be contained by a given element (this SGML function was dropped in order to make XML more manageable), the XHTML DTD includes a special "Content models for exclusions" note that reinforces the following:

  • The <a> tag cannot contain another <a> tag.

  • The <pre> tag cannot contain <img>, <object>, <applet>, <big>, <small>, <sub>, <sup>, <font>, or <basefont>.

  • The <form> element may not contain other <form> tags.

  • The <button> tag cannot contain <a>, <form>, <input>, <select>, <textarea>, <label>, <button>, <iframe>, or <isindex>.

  • The <label> tag cannot contain other <label> tags.

31.4.7. Character Entities

XHTML (as a function of XML) is extremely fussy about special characters such as <, >, &, etc. All special characters should be represented in the XHTML document by their character entities instead. Common character entities are listed in Table 10-3, and the complete list appears in Appendix F, "Character Entities".

Character entity references should be used in place of characters such as < and & in regular text content, as shown in these exmples.

<p> the value of A &lt; B </p>
<p> Laverne &amp; Shirley </p>

Places where it was common to use special characters, such as in the title of a document or in an attribute value, it is necessary to use the character entity instead. For instance, the following worked just fine in HTML:

<img src="puppets.jpg" alt="Crocco & Lynch"/>

But in XHTML, the value must be written like this:

<img src="puppets.jpg" alt="Crocco &amp; Lynch"/>

31.4.8. Protecting Scripts

It is common practice to enclose scripts and stylesheets in comments (between <! -- and -- >). Unfortunately, XML software thinks of comments as unimportant information and may simply remove the comments from a document before processing it. To avoid this problem, use an XML CDATA section instead. Content enclosed in <![CDATA[...]]> is considered simple text characters and is not parsed as potential document elements. For example:

<script language="JavaScript">
<![CDATA[
...JavaScript here...
]]>
</script>

The problem with this method is backwards compatibility. HTML browsers ignore the contents of the XML CDATA tag, while XML browsers ignore the contents of comment-enclosed scripts and style sheets. So you can't please everyone! One workaround is to put your scripts and styles in separate files and reference them in the document with appropriate external links.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.