Chapter 2. Database Design
Once you install your DBMS software on your computer, it can be very
tempting to just jump right into creating a database without much
thought or planning. As with any software development, this kind of
ad hoc approach works with only the simplest of problems. If you
expect your database to support any kind of complexity, some planning
and design will definitely save you time in the long run. You will
need to take a look at what details are important to good database
design.
2.1. Database Design
Suppose
you have a large collection of compact discs and you want to create a
database to track them. The first step is to determine what the data
that you are going to store is about. One good way to start is to
think about why you want to store the data in the first place. In our
case, we most likely want to be able to look up CDs by artist, title,
and song. Since we want to look up those items, we know they must be
included in the database. In addition, it is often useful to simply
list items that should be tracked. One possible list might include:
CD title, record label, band name, song title. As a starting point,
we will store the data in the table shown in Table 2-1.
Table 2-1. A CD Database Made Up of a Single Table
Band Name |
CD Title |
Record Label |
Songs |
Stevie Wonder |
Talking Book |
Motown |
You Are the Sunshine of My Life, Maybe Your Baby, Superstition, . . . |
Miles Davis Quintet |
Miles Smiles |
Columbia |
Orbits, Circle, . . . |
Wayne Shorter |
Speak No Evil |
Blue Note |
Witch Hunt, Fee-Fi-Fo-Fum |
Herbie Hancock |
Headhunters |
Columbia |
Man, . . . |
Herbie Hancock |
Maiden Voyage |
Blue Note |
Maiden Voyage |
(For brevity's sake, we have left out most of the songs.) At
first glance, this table seems like it will meet our needs since we
are storing all of the data we need. Upon closer inspection, however,
we find several problems. Take the example of Herbie Hancock.
"Band Name" is repeated twice: once for each CD. This
repetition is a problem for several reasons. First, when entering
data in the database, we end up typing the same name over and over.
Second, and more important, if any of the data changes, we have to
update it in multiple places. For example, what if
"Herbie" were misspelled? We would have to update the
data in each of the two rows. The same problem would occur if the
name Herbie Hancock changes in the future (à la Jefferson
Airplane or John Cougar). As we add more Herbie Hancock CDs to our
collection, we add to the amount of effort required to maintain data
consistency.
Another problem with the single CD table lies in the way it stores
songs. We are storing them in the CD table as a list of songs in a
single column. We will run into all sorts of problems if we want to
use this data meaningfully. Imagine having to enter and maintain that
list. And what if we want to store the length of the songs as well?
What if we want to perform a search by song title? It quickly becomes
clear that storing the songs in this fashion is undesirable.
This is where database design comes into play. One of the main
purposes of database design is to eliminate redundancy from the
database. To accomplish this task, we use a technique called
normalization. Before we start with normalization, let's start
with some fundamental relational database concepts. A data model is a
diagram that illustrates your database design. It is made up of three
main elements: entities, attributes, and relationships. For now,
let's focus on entities and attributes; we will take a look at
relationships later.
2.1.2. Entity Attributes
An attribute describes information about
an entity that must be captured. Each entity has zero or more
attributes that describe it, and each attribute describes exactly one
entity. Each entity instance (row in the table) has exactly one
value, possibly NULL, for each of its attributes.
An attribute value can be numeric, a character string, date, time, or
some other basic data value. In the first step of database design,
logical data modeling, we do not worry about how the attributes will
be stored.
NOTE
NULL provides the basis for the problem of dealing
with missing information. It is specifically used for the case in
which you lack a certain piece of information. As an example,
consider the situation where a CD does not list the song lengths of
each of its tracks. Each song has a length, but you cannot tell from
the case what that length is. You do not want to store the length as
zero, since that would be incorrect. Instead, you store the length as
NULL. If you are thinking you could store it as
zero and use zero to mean "unknown", you are falling into
one of the same traps that led to one of the Y2K problems. Not only
did old systems store years as two digits, but they often gave a
special meaning to 9-9-99.
Our example database refers to a number of things: the CD, the CD
title, the band name, the songs, and the record label. Which of these
are entities and which are attributes?
2.1.3. Data Model
Notice that
we capture several pieces of data (CD title, band name, etc.) about
each CD, and we absolutely cannot describe a CD without those items.
CD is therefore one of those things we want to capture data about and
is likely an entity. To start a data model, we will diagram it as an
entity. Figure 2-1 shows our sole entity in a data
model.
Figure 2-1. The CD entity in a data model
By common entity naming conventions, an entity name must be singular.
We therefore call the table where we store CDs "CD" and
not "CDs." We use this convention because each entity
names an instance. For example, the "San Francisco 49ers"
is an instance of "Football Team," not "Football
Teams."
At first glance, it appears that the rest of the database describes a
CD. This would indicate that they are attributes of CD. Figure 2-2 adds them to the CD entity
in Figure 2-1. In a data model, attributes appear
as names listed in their entity's box.
Figure 2-2. The CD entity with its attributes
This diagram is simple, but we are not done yet. In fact, we have
only just begun. Earlier, we discussed how the purpose of data
modeling is to eliminate redundancy using a technique called
normalization. We have a nice diagram for our database, but we have
not gotten rid of the redundancy as we set out to do. It is now time
to normalize our database.
data:image/s3,"s3://crabby-images/4b383/4b383961623a58849ac64b7bef94319cdcd23ccf" alt="Previous" | data:image/s3,"s3://crabby-images/7cb2d/7cb2dfb2622046775cb8ca1631b643b4151bb87f" alt="Home" | data:image/s3,"s3://crabby-images/1f361/1f3619a078512ee43df49505e3254ff68272f01a" alt="Next" | 1.4. MySQL and mSQL | data:image/s3,"s3://crabby-images/7fac2/7fac2cebd458a34aaf981e761e443fc2ad5ddabc" alt="Book Index" | 2.2. Normalization |
Copyright © 2001 O'Reilly & Associates. All rights reserved.
|
|