home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeManaging and Using MySQLSearch this book

Chapter 7. Database Design

Once you have installed the DBMS software on your computer, it can be very tempting to just jump right into creating a database without much thought or planning. As with any software development, this kind of ad hoc approach works only with the simplest of problems. If you expect your database to support any kind of complexity, some planning and design will definitely save you time in the long run. In this chapter, we'll take a look at what constitutes good database design.

7.1. Database Design Primer

Suppose you have a large collection of compact discs and you want to create a database to track them. The first step is to determine what data you are going to store. One good way to start is to think about why you want to store the data in the first place. In our case, we will most likely want to look up CDs by artist, title, and song. Since we want to look up those items, we know they must be included in the database. In addition, it is often useful to simply list items that should be tracked. One possible list might include: CD title, record label, band name, song title. As a starting point, we will store the data shown in Table 7-1.

Table 7-1. A CD database made up of a single table

Band name

CD title

Record label

Songs

Stevie Wonder

Talking Book

Motown

You Are the Sunshine of My Life, Maybe Your Baby, Superstition, ...

Miles Davis Quintet

Miles Smiles

Columbia

Orbits, Circle, ...

Wayne Shorter

Speak No Evil

Blue Note

Witch Hunt, Fee-Fi-Fo-Fum

Herbie Hancock

Headhunters

Columbia

Chameleon, Watermelon Man, ...

Herbie Hancock

Maiden Voyage

Blue Note

Maiden Voyage

For brevity's sake, we have left out most of the songs. At first glance, this table seems to meet our needs since we are storing all the data we need. Upon closer inspection, however, we find several problems. Take the band named Herbie Hancock, for example. "Band name" appears twice: once for each CD. This repetition is a problem for several reasons. First, when entering data in the database, we end up typing the same name over and over. Second, and more important, if any of the data changes, we have to update it in multiple places. For example, what if "Herbie" were misspelled? We would have to update the data in each of the two rows. The same problem would occur if the name Herbie Hancock changes in the future (like Jefferson Airplane changed to Jefferson Starship). As we add more Herbie Hancock CDs to our collection, we add to the effort required to maintain data consistency.

Another problem with the single CD table lies in the way it stores songs. We are storing them in the CD table as a list of songs in a single column. We will run into all sorts of problems if we want to use this data meaningfully. Imagine having to enter and maintain that list. And what if we want to store the length of the songs as well? What if we want to perform a search by song title? It quickly becomes clear that storing the songs in this fashion is undesirable.

This is where database design comes into play. One of the main purposes of database design is to eliminate redundancy from the database. To accomplish this task, we use a technique called normalization. Before we discuss normalization, let's start with some fundamental relational database concepts: entities, attributes, and data models.

7.1.2. Entity Attributes

An attribute describes information about an entity that must be captured. Each entity has zero or more attributes that describe it, and each attribute describes exactly one entity. Each entity instance (row in the table) has exactly one value, possibly NULL, for each of its attributes. An attribute value can be numeric, a character string, a date, a time, or some other basic data value type. In the first step of designing a database, logical data modeling, we do not worry about how the attributes will be stored.

TIP: NULL provides the basis for dealing with missing information. It is specifically used for cases in which you lack a certain piece of information. As an example, consider a CD that does not list the song lengths of each of its tracks. Each song has a length, but you cannot tell from the case what that length is. You do not want to store the length as zero, since that would be incorrect. Instead, you store the length as NULL. If you are thinking you could store it as zero and use zero to mean "unknown," you are falling into one of the same traps that led to one of the Y2K problems. Not only did old systems store years as two digits, but they often gave a special meaning to 9-9-99, assuming it was safe to do that in the expectation that the system would be rewritten long before that date was ever reached.

Our example database refers to a number of things: CD titles, band names, songs, and record labels. Which of these are entities and which are attributes?



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.