Chapter 8. HTML Overview
HTML (Hypertext Markup Language) is the language used to create web documents. It defines the syntax and placement of the elements that make up the structure of a web document. All web page elements are identified by special tags that give browsers instructions on how to display the content (the tags themselves do not display). Some HTML tags are used to create links to other documents, either locally or over a network such as the Internet.
This chapter provides a basic introduction to the background and general syntax of HTML, including document structure, tags, and their attributes. It also looks briefly at good HTML style and the pros and cons of using WYSIWYG authoring tools.
For a more in-depth study of HTML, I recommend HTML and XHTML: The Definitive Guide, by Chuck Musciano and Bill Kennedy (O'Reilly, 2000). Another excellent resource for HTML tag information is the HTML Compendium (created by Ron Woodall). The Compendium provides an alphabetical listing of every HTML tag and its attributes, with explanations and up-to-date browser support information for each. The browser support charts accompanying each tag in this book are based on the Compendium. The HTML Compendium can be found at http://www.htmlcompendium.org.
8.1. The HTML Standard
The HTML standard and all other Web-related standards are developed under the authority of the World Wide Web Consortium (W3C). Standards, specifications, and drafts of new proposals can be found at http://www.w3.org. The most recent standard for document markup is the HTML 4.01 specification.
The HTML standard traveled a long, difficult road to its current state of relative stability. Early on, competition between the major web browsers led to a mess of proprietary tags, HTML extensions, and practices that muddied the original intent of HTML in favor of more control over page display.
The W3C has pulled in the reins with the HTML 4.0 specification (which is further refined in the current 4.01 version). It incorporates many of the tags introduced by the popular browsers that improve web functionality. It also officially "deprecates" tags that are used in common practice but are not in keeping with the priorities of the markup language (such as keeping style information out of content).
8.1.1. Keeping Style Separate from Content
Before HTML there was SGML (Standard Generalized Markup Language), which established the system of describing documents in terms of their structure, independent of appearance. SGML is a vast set of rules for developing markup languages such as HTML, but it is so all-encompassing that HTML uses only a small subset of its capabilities.
Publishers began storing SGML versions of their documents so that they could be translated into a variety of end uses. For example, text that is tagged as a heading may be formatted one way if the end product is a printed book, but another way for a CD-ROM. The advantage is that a single source file can be used to create a variety of end products. The way it is interpreted and displayed (i.e., the way it looks) depends on the end use.
Because HTML is one application of an SGML tagging system, this principle of keeping style information separate from the structure of the document remains inherent to the HTML purpose. Over the past few years, this ideal has been compromised by the creation of HTML tags that contain explicit style instructions, such as the <font> tag.
Cascading Style Sheets promise to keep style information out of the content by storing all style instructions in a separate document (or a separate section of the source document). With this system in place, the W3C is more diligent than ever to clean up the HTML standard to make it work the way it was intended. For more information, see Chapter 17, "Cascading Style Sheets".
8.1.2. Three Flavors of HTML 4.01
While the W3C has definite ideas on how HTML should work, they are also aware that it is going to be a while before old browsers are phased out and web authors begin to mark up documents properly. For that reason, the HTML 4.01 specification actually encompasses three slightly different specification documents: one "strict," one "transitional," and one just for framed documents. These documents, called Document Type Definitions (or DTDs), define every tag, attribute, and entity along with the rules for their use. DTDs are written following the rules and conventions of SGML (Standard Generalized Markup Language).
The HTML 4.01 Strict DTD excludes all deprecated tags and attributes (those scheduled to be phased out). In an ideal world, all developers would mark up the structure of their documents according to the strict version of HTML, leaving all presentation to be handled by style sheets.
The HTML 4.01 Transitional DTD is less restrictive, and it includes many of the elements dedicated to appearance (such as the <font> tag and the align attribute) that are in common use today. Most developers today comply with the transitional specification because it allows more control over presentation while the industry waits for older browsers (those that don't support new features such as style sheets) to fade away.
The Frameset DTD is identical to the Transitional DTD, except that it allows for the <frameset> element to be used in place of the standard <body> element. Frames are discussed in Chapter 14, "Frames".
8.1.3. The Web Standards Movement
After years of frustration coding for incompatible browsers, the web development community finally said, "Enough is enough!" and began putting pressure on the browser developers to change their ways. The charge was led in part by the Web Standards Project (WaSP, http://www.webstandards.org), an industry watchdog group that works diligently to convince the browser developers that it is in everyone's best interest to comply with the established web standards.
Fortunately, the browser developers listened, and things have settled considerably in the last three years. Microsoft Internet Explorer began nearly complete support for HTML 4.01 in Version 5.5 for Windows (5.0 for Mac). Netscape's 4.x releases support most of the tags in the HTML 4.0 specification, and its 6.0 release is fully compliant with HTML 4.01 (with very few exceptions). Other browsers, most notably Opera, have stuck to the specifications from the very beginning.
8.1.4. Web Standards in This Book
The intention of this book is to be highly mindful of and compliant with the standards effort. The tag information in the following chapters reflects the current HTML 4.01 Transitional Specification. However, it also represents HTML common practices and includes some tags that are not necessarily part of the standard. In all cases where a tag or attribute is proprietary (works with only one browser) or deprecated by the W3C, it is clearly labeled as such. In this way, I hope to paint a complete picture of HTML while endorsing the standard.
8.1.5. The Future of HTML
According to the W3C, HTML 4.01 is the end of the line for HTML as we know it. The next version of HTML is the XHTML Version 1.0 specification. XHTML is the same HTML specification as we know it today, but rewritten using the new-and-improved rules of XML (Extensible Markup Language). XHTML uses all the same HTML 4.01 tags, but it enforces a set of rules (such as closing all tags, putting attribute values in quotation marks, and keeping tags all lowercase) that make a document "well-formed." Well-formed XHTML will work in next-generation XML-based browsers, where HTML will not. Our current HTML coding standards are incredibly lax by comparison.
Copyright © 2002 O'Reilly & Associates. All rights reserved.