Well-Formed XHTML (Web Design in a Nutshell, 2nd Edition)

31.4. Well-Formed XHTML

Web browsers are forgiving of sloppy HTML, but XHTML (being an XML application) requires fastidious attention to every detail. These requirements were outlined briefly in the XML chapter (Chapter 30, "Introduction to XML"), but we'll go over them in this section as they relate specifically to XHTML.

31.4.1. All-Lowercase Element Names

In XML, all tags and attributes are case-sensitive, which means that <img>, <Img>, and <IMG> are parsed as different elements. In the reformulation of HTML into XHTML, all elements were interpreted to be lowercase. When writing XHTML documents (and their associated style sheets), be sure that all tags and attributes are written in lowercase.

If you want to convert the upper- and mixed-case tags in an existing HTML file to well-formed, all-lowercase tags, try the Tidy utility (mentioned previously) or Barebones Software BBEdit (Macintosh only), which can automate the process.

31.4.2. Quoted Attribute Values

XHTML requires that all attribute values be contained in double quotation marks. So where previously it was okay to omit the quotes around single words and numeric values, now you need to be careful that every value is quoted.

31.4.3. End Tags

In HTML, it is okay to omit the end tags for many block elements (such as <p> and <li>) because the browser is smart enough to close a block element when the next one begins. Not so in XHTML. In order to be well-formed, every container element must have its end tag, or it registers as an error and renders the document noncompliant.

31.4.4. Empty Elements

This need for closure extends to empty (standalone) elements as well. So instead of just inserting a line break as <br>, XHTML requires the closing tag as well (<br>...</br>). Fortunately, you can "close" empty elements simply by adding a slash before the closing bracket, indicating its ending. So in XHTML, a line break can be entered as <br/>.

The notion of closing empty elements can cause some browsers to complain, so to keep your XHTML code safe for current browsers, be sure to add a space before the closing slash (<br />). This allows the closed empty tag to slide right through.

Of course, line break tags aren't the only empty element. Table 31-1 shows all the HTML tags in their acceptable XHTML (transitional DTD) forms.

Table 31-1. Empty tags in XHTML format

`<area />`	`<frame />`	`<isindex />`
`<base />`	`<hr />`	`<link />`
`<basefont />`	`<img />`	`<meta />`
`<br />`	`<input />`	`<param />`
`<col />`

31.4.5. Explicit Attribute Values

In XHTML, every attribute must have an explicit value. There are many attributes in regular HTML that are standalone instructions that take no value, such as noshade and ismap. In XHTML, attributes without values must now use their own names. Therefore, noshade becomes noshade="noshade" and ismap is now ismap="ismap". Table 31-2 lists the attributes which have been given new values in XHTML.

Table 31-2. Explicit attribute values

checked="checked"	disabled="disabled"	noresize="noresize"
compact="compact"	ismap="ismap"	nowrap="nowrap"
declare="declare"	multiple="multiple"	readonly="readonly"
defer="defer"	noshade="noshade"	selected="selected"

31.4.6. Nesting Requirements

It has always been a rule in HTML that tags should be properly nested within one another. The closing tag of a contained element should always appear before the closing tag of the element that contains it. In XHTML, this rule is strictly enforced. So be sure that your elements are nested correctly, like this:

<b>I can <i>fly!</i></b>

and not overlapping like this:

<b>I can <i>fly!</b></i>

In addition, XHTML enforces other nesting restrictions that have always been a part of the HTML specification. While XML provides no specific way to indicate which elements may not be contained by a given element (this SGML function was dropped in order to make XML more manageable), the XHTML DTD includes a special "Content models for exclusions" note that reinforces the following:

The <a> tag cannot contain another <a> tag.
The <pre> tag cannot contain <img>, <object>, <applet>, <big>, <small>, <sub>, <sup>, <font>, or <basefont>.
The <form> element may not contain other <form> tags.
The <button> tag cannot contain <a>, <form>, <input>, <select>, <textarea>, <label>, <button>, <iframe>, or <isindex>.
The <label> tag cannot contain other <label> tags.

31.4.7. Character Entities

XHTML (as a function of XML) is extremely fussy about special characters such as <, >, &, etc. All special characters should be represented in the XHTML document by their character entities instead. Common character entities are listed in Table 10-3, and the complete list appears in Appendix F, "Character Entities".

Character entity references should be used in place of characters such as < and & in regular text content, as shown in these exmples.

<p> the value of A &lt; B </p>
<p> Laverne &amp; Shirley </p>

Places where it was common to use special characters, such as in the title of a document or in an attribute value, it is necessary to use the character entity instead. For instance, the following worked just fine in HTML:

<img src="puppets.jpg" alt="Crocco & Lynch"/>

But in XHTML, the value must be written like this:

<img src="puppets.jpg" alt="Crocco &amp; Lynch"/>

31.4.8. Protecting Scripts

It is common practice to enclose scripts and stylesheets in comments (between <! -- and -- >). Unfortunately, XML software thinks of comments as unimportant information and may simply remove the comments from a document before processing it. To avoid this problem, use an XML CDATA section instead. Content enclosed in <![CDATA[...]]> is considered simple text characters and is not parsed as potential document elements. For example:

<script language="JavaScript">
<![CDATA[
...JavaScript here...
]]>
</script>

The problem with this method is backwards compatibility. HTML browsers ignore the contents of the XML CDATA tag, while XML browsers ignore the contents of comment-enclosed scripts and style sheets. So you can't please everyone! One workaround is to put your scripts and styles in separate files and reference them in the document with appropriate external links.

31.4.9. id and name Attributes

And finally, while the name attribute is used in HTML to identify elements such as document fragments, frames, and images so they can be referenced elsewhere in the document, XHTML prefers the equivalent (and standards-compliant) id attribute. The name attribute has been deprecated in the HTML 4.0 specification for the elements that once used it. Now that most browsers are HTML 4.0-compliant, you can begin making the transition by using id where you might have used name (or use them both at the same time with the same value).