home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeHTML & XHTML: The Definitive GuideSearch this book

Chapter 4. Text Basics

Any successful presentation, even a thoughtful tome, should have its text organized into an attractive, effective document. Organizing text into attractive documents is HTML and XHTML's forte. The languages give you a number of tools that help you mold your text and get your message across. They also help structure your document so that your target audience has easy access to your words.

Always keep in mind while designing your documents (here we go again!) that the markup tags, particularly in regard to text, only advise -- they do not dictate -- how a browser will ultimately render the document. Rendering varies from browser to browser. Don't get too entangled with trying to get just the right look and layout. Your attempts may and probably will be thwarted by the browser.

4.1. Divisions and Paragraphs

Like most text processors, a browser wraps the words it finds to fit the horizontal width of its viewing window. Widen the browser's window and words automatically flow up to fill the wider lines. Squeeze the window and words wrap downwards.

Unlike most text processors, however, HTML and XHTML use explicit division (<div>), paragraph (<p>), and line-break (<br>) tags to control the alignment and flow of text. Return characters, although quite useful for readability of the source document, typically are ignored by the browser -- authors must use the <br> tag to explicitly force a common text line break. The <p> tag, while also performing the task, carries with it meaning and effects beyond a simple line break.

The <div> tag is a little different. Originally codified in the HTML 3.2 standard, <div> was included in the language to be a simple organizational tool -- to divide the document into discrete sections -- whose somewhat obtuse meaning meant few authors used it. But recent innovations -- alignment, styles, and the id attribute for document referencing and automation -- now let you more distinctly label and thereby define individual sections of your documents, as well as control the alignment and appearance of those sections. These features breathe real life and meaning into the <div> tag.

By associating an id and a class name with the various sections of your document, each delimited by a <div id=name class=name> tag and attributes (you can do the same with other tags like <p>, too), you not only label those divisions for later reference by a hyperlink and for automated processing and management (collect all the bibliography divisions, for instance), but you may also define different, distinct display styles for those portions of your document. For instance, you might define one divisional class for your document's abstract (<div class=abstract>, for example), another for the body, a third for the conclusion, and a fourth divisional class for the bibliography (<div class=biblio>, for example).

Each class, then, might be given a different display definition in a document-level or externally related style sheet: the abstract indented and in an italic typeface (such as div.abstract {left-margin: +0.5in; font-style: italic}); the body in a left-justified roman typeface; the conclusion similar to the abstract; and the bibliography automatically numbered and formatted appropriately.

We provide a detailed description of style sheets, classes, and their applications in Chapter 8, "Cascading Style Sheets".

4.1.1. The <div> Tag

As defined in the HTML 4.01 and XHTML 1.0 standards, the <div> tag divides your document into separate, distinct sections. It may be used strictly as an organizational tool, without any sort of formatting associated with it; it becomes more effective if you add the id and class attributes to label the division. The <div> tag may also be combined with the align attribute to control the alignment of whole sections of your document's content in the display and with the many programmatic "on" attributes for user interaction.

<div>

Function:

Defines a block of text

Attributes:

ALIGN

ONKEYPRESS

CLASS

ONKEYUP

DIR

ONMOUSEDOWN

ID

ONMOUSEMOVE

LANG

ONMOUSEOUT

NOWRAP

ONMOUSEOVER

ONCLICK

ONMOUSEUP

ONDBLCLICK

STYLE

ONKEYDOWN

TITLE

End tag:

</div>; usually omitted in HTML

Contains:

body_content

Used in:

block

4.1.1.4. The id attribute

Use the id attribute to label the document division specially for later reference by a hyperlink, style sheet, applet, or other automated process. An acceptable id value is any quote-enclosed string that uniquely identifies the division and that later can be used to reference that document section unambiguously. Although we're introducing it within the context of the <div> tag, this attribute can be used with almost any tag.

When used as an element label, the value of the id attribute can be added to a URL to address the labelled element uniquely within the document. You can label both large portions of content (via a tag like <div>) or small snippets of text (using a tag like <i> or <span>). For example, you might label the abstract of a technical report using <div id="abstract">. A URL could jump right to that abstract by referencing report.html#abstract. When used in this manner, the value of the id attribute must be unique with respect to all other id attributes within the document, and all the names defined by any <a> tags with the name attribute. Section 6.3.3, "Linking Within a Document"

When used as a style-sheet selector, the value of the id attribute is the name of a style rule that can be associated with the current tag. This provides a second set of definable style rules, similar to the various style classes you can create. A tag can use both the class and id attributes to apply two different rules to a single tag. In this usage, the name associated with the id attribute must be unique with respect to all other style IDs within the current document. A more complete description of style classes and IDs can be found in Chapter 8, "Cascading Style Sheets".

4.1.2. The <p> Tag

The <p> tag signals the start of a paragraph. That's not well-known even by some veteran webmasters, because it runs counterintuitive to what we've come to expect from experience. Most word processors we're familiar with use just one special character, typically the return character, to signal the end of a paragraph. In HTML and XHTML, each paragraph should start with <p> and ends with the corresponding </p> tag. And while a sequence of newline characters in a text processor-displayed document creates an empty paragraph for each one, browsers typically ignore all but the first paragraph tag.

In practice, with HTML you can ignore the starting <p> tag at the beginning of the first paragraph, and the </p> tag at the end of paragraphs: they can be implied from other tags that occur in the document, and hence safely omitted.[20]

[20]XHTML, on the other hand, requires explicit starting and ending tags.

For example:

<body>
This is the first paragraph, at the very beginning of the 
body of this document.
<p>
The tag above signals the start of this second paragraph. 
When rendered by a browser, it will begin slightly below the 
end of the first paragraph, with a bit of extra white space 
between the two paragraphs.
<p>
This is the last paragraph in the example.
</body>

Notice that we haven't included the paragraph start tag (<p>) for the first paragraph or any end paragraph tags at all in the HTML example; they can be unambiguously inferred by the browser and are therefore unnecessary.

<p>

Function:

Defines a paragraph of text

Attributes:

ALIGN

ONKEYUP

CLASS

ONMOUSEDOWN

DIR

ONMOUSEMOVE

ID

ONMOUSEOUT

LANG

ONMOUSEOVER

ONCLICK

ONMOUSEUP

ONDBLCLICK

STYLE

ONKEYDOWN

TITLE

ONKEYPRESS

End tag:

</p>; often omitted in HTML

Contains:

text

Used in:

block

In general, you'll find that human document authors tend to omit postulated tags whenever possible while automatic document generators tend to insert them. That may be because the software designers didn't want to run the risk of having their product chided by competitors as not adhering to the HTML standard, even though we're splitting letter-of-the-law hairs here. Go ahead and be defiant: omit that first paragraph's <p> tag and don't give a second thought to paragraph ending </p> tags, provided, of course, that your document's structure and clarity are not compromised. That is, as long as you are aware that XHTML frowns severely on such laxity.

4.1.2.6. Allowed paragraph content

A paragraph may contain any element allowed in a text flow, including conventional words and punctuation, links (<a>), images (<img>), line breaks (<br>), font changes (<b>, <i>, <tt>, <u>, <strike>, <big>, <small>, <sup>, <sub>, and <font>), and content-based style changes (<acronym>, <cite>, <code>, <dfn>, <em>, <kbd>, <samp>, <strong>, and <var>). If any other element occurs within the paragraph, it implies that the paragraph has ended, and the browser assumes that the closing </p> tag was not specified.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.