home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  

Book HomeHTML & XHTML: The Definitive GuideSearch this book

2.6. Text

Text-related HTML/XHTML markup tags comprise the richest set of all in the standard languages. That's because the original language -- HTML -- emerged as a way to enrich the structure and organization of text.

HTML came out of academia. What was and still is important to those early developers was the ability of their mostly academic, text-oriented documents to be scanned and read without sacrificing their ability to distribute documents over the Internet to a wide diversity of computer display platforms. (ASCII text is the only universal format on the global Internet.) Multimedia integration is something of an appendage to HTML and XHTML, albeit an important one.

And page layout is secondary to structure. We humans visually scan and decide textual relationships and structure based on how it looks; machines can only read encoded markings. Because documents have encoded tags that relate meaning, they lend themselves very well to computer-automated searches and also to the recompilation of content -- features very important to researchers. It's not so much how something is said as what is being said.

Accordingly, neither HTML nor XHTML are page-layout languages. In fact, given the diversity of user-customizable browsers as well as the diversity of computer platforms for retrieval and display of electronic documents, all these markup languages strive to accomplish is to advise, not dictate, how the document might look when rendered by the browser. You cannot force the browser to display your document in any certain way. You'll hurt your brain if you insist otherwise.

2.6.1. Appearance of Text

For instance, you cannot predict what font and what absolute size -- 8- or 40-point Helvetica, Geneva, Subway, or whatever -- will be used for a particular user's text display. Okay, so the latest browsers now support standard Cascading Style Sheets and other desktop publishing-like features that let you control the layout and appearance of your documents. But users may change their browser's display characteristics and override your carefully laid plans at will; quite a few of the older browsers out there don't support these new layout features; and some browsers are text-only with no nice fonts at all. What to do? Concentrate on content. Cool pages are a flash in the pan. Deep content will bring people back for more and more.

Nonetheless, style does matter for readability, and it is good to include it where you can, as long as it doesn't interfere with content presentation. You can attach common style attributes to your text with physical style tags like the italic <i> tag in the simple example. More importantly and truer to the language's original purpose, HTML and XHTML have content-based style tags that attach meaning to various text passages. And you can alter text display characteristics, such as font style and size, color, and so on, with Cascading Style Sheets.

Today's graphical browsers recognize the physical and content-related text style tags and change the appearance of their related text passage to visually convey meaning or structure. You can't predict exactly what that change will look like.

The HTML 4 standard, and particularly the XHTML 1.0 standard, stress that future browsers will not be so visually bound. Text contents may be heard or even felt, for example, not read by viewers. Context clues surely are better in those cases than physical styles.

2.6.2. Text Structures

It's not obvious in our simple example, but the common carriage returns we use to separate paragraphs in our source document have no meaning in HTML or XHTML, except in special circumstances. You could have typed the document onto a single line in your text editor and it would still appear the same in Figure 2-1.[7]

[7]We use a computer programming-like style of indentation so that our source HTML/XHTML documents are more readable. It's not obligatory, nor are there any formal style guidelines for source HTML/XHTML document text formats. We do, however, highly recommend that you adopt a consistent style, so that you and others can easily follow your source documents.

You'd soon discover, too, if you hadn't read it here first, that except in special cases, browsers typically ignore leading and trailing spaces, and sometimes more than a few in between. (If you look closely at the source example, the line "Greetings from" looks like it should be indented by leading spaces, but it isn't in Figure 2-1.) Divisions, paragraphs, and line breaks

A browser takes the text in the body of your document and "flows" it onto the computer screen, disregarding any common carriage-return or line-feed characters in the source. The browser fills as much of each line of the display window as possible, beginning flush against the left margin, before stopping after the rightmost word and moving on to the next line. Resize the browser window, and the text reflows to fill the new space, indicating HTML's inherent flexibility.

Of course, readers would rebel if your text just ran on and on, so HTML and XHTML provide both explicit and implicit ways to control the basic structure of your document. The most rudimentary and common ways are with the division (<div>), paragraph (<p>), and line-break (<br>) tags. All break the text flow, which consequently restarts on a new line. The differences are that the <div> and <p> tags define an elemental region of the document and text, respectively, the contents of which you may specially align within the browser window, apply text styles to, and alter with other block-related features.

Without special alignment attributes, the <div> and <br> tags simply break a line of text and place subsequent characters on the next line. The paragraph tag adds more vertical space after the line break than either the <div> or <br> tags. Section 4.1.1, "The <div> Tag" Section 4.1.2, "The <p> Tag" Section 4.7.1, "The <br> Tag"

By the way, the HTML standard includes end tags for the paragraph and division tags, but not for the line-break tag.[8] Few authors ever include the paragraph end tag in their documents; the browser usually can figure out where one paragraph ends and another begins.[9] Give yourself a star if you knew that </p> even exists.

[8]With XHTML, <br>'s start and end are between the same brackets: <br />. Browsers tend to be very forgiving and often ignore extraneous things, such as the forward slash in this case, so it's perfectly okay to get into the habit of adding that end-mark.

[9]The paragraph end tag is being used more commonly now that the popular browsers support the paragraph-alignment attribute.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.