home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeMySQL and mSQLSearch this book

6.3. SQL Datatypes

In a table, each column has a type. As we mentioned earlier, a SQL datatype is similar to a datatype in traditional programming languages. While many languages define a bare-minimum set of types necessary for completeness, SQL goes out of its way to provide types such as MONEY and DATE that will be useful to every day users. You could store a MONEY type in a more basic numeric type, but having a type specifically dedicated to the nuances of money processing helps add to SQL's ease of use -- one of SQL's primary goals.

Chapter 15, "SQL Reference", provides a full reference of SQL types supported by MySQL or mSQL. Table 6-1 is an abbreviated listing of the most common types supported in both languages.

Table 6-1. The Most Often Used Datatypes Common to Both MySQL and mSQL

Datatype

Description

INT

An integer value. MySQL allows an INT to be either signed or unsigned, while mSQL provides a distinct type, UINT, for unsigned integers.

REAL

A floating point value. This type offers a greater range and precision than the INT type, but it does not have the exactness of an INT.

CHAR(length)

A fixed-length character value. No CHAR fields can hold strings greater in length than the specified value. Fields of lesser length are padded with spaces. This type is likely the most commonly used type in any SQL implementation.

TEXT(length)

A variable length character value. In mSQL, the given length is used as a suggestion as to how long the strings being stored will be. You may store larger values, but at a performance cost. Under MySQL, TEXT is just one of many variable-length datatypes.

DATE

A standard date value. While the format for storing a date differs between MySQL and mSQL, both database engines are capable of using the DATE type to store arbitrary dates for the past, present, and future. Both database engines are Y2K compliant in their date storage.

TIME

A standard time value. This type stores the time of day independent of a particular date. When used together with a date, a specific date and time can be stored. MySQL additionally supplies a DATETIME type that will store date and time together in one field.

NOTE

MySQL supports the UNSIGNED attribute for all numeric types. This modifier forces the column to accept only positive (unsigned) numbers. Unsigned fields have an upper limit that is double that of their signed counterparts. An unsigned TINYINT -- MySQL's single byte numeric type -- has a range of to 255 instead of the -127 to 127 range of its signed counterpart.

Both database engines provide more types than those mentioned above. MySQL, in particular, is very rich in the number of datatypes it supports. In day-to-day programming, however, you will find yourself using mostly the types mentioned earlier. With mSQL, choosing a datatype is pretty much as simple as picking the type that most closely resembles the data you want to store. The size of the data you wish to store, however, plays a much larger role in designing MySQL tables.

6.3.2. Character Types

Managing character types is a little more complicated. Not only do you have to worry about the minimum and maximum string lengths, but you also have to worry about the average size, the amount of variation likely, and the need for indexing. For our current purposes, an index is a field or combination of fields on which you plan to search -- basically, the fields in your WHERE clause. Indexing is, however, much more complicated than this simplistic description, and we will cover indexing later in the chapter. The important fact to note here is that indexing one character fields works best when the field is fixed length. In fact, mSQL does not even provide an indexible variable-length character field! If there is little -- or, preferably, no -- variation in the length of your character-based fields, then a CHAR type is likely the right answer. An example of a good candidate for a CHAR field is a country code. The ISO provides a comprehensive list of standard two-character representations of country codes (US for the U.S.A., FR for France, etc.).[9] Since these codes are always exactly two characters, a CHAR(2) is always the right answer for this field.

[9]Don't be lulled into believing states/provinces work this way. If you want to write an application that works in an international environment and stores state/province codes, make sure to make it a CHAR(3) since Australia uses three-character state codes. Also note that there is a 3-character ISO country-code standard.

A value does not need to be invariant in its length to be a candidate for a CHAR field. It should, however, have very little variance. Phone numbers, for example, can be stored safely in a CHAR(13) field even though phone number length varies from nation to nation. The variance simply is not that great, so there is no value to making a phone number field variable in length. The important thing to keep in mind with a CHAR field is that no matter how big the actual string being stored is, the field always takes up exactly the number of characters specified as the field's size -- no more, no less. Any difference between the length of the text being stored and the length of the field is made up by padding the value with spaces. While the few potential extra characters being wasted on a subset of the phone number data is not anything to worry about, you do not want to be wasting much more. Variable-length text fields meet this need.

A good, common example of a field that demands a variable-length datatype is a web URL. Most web addresses can fit into a relatively small amount of space -- http://www.ora.com, http://www.hughes.com.au, http://www.mysql.com -- and consequentially do not represent a problem. Occasionally, however, you will run into web addresses like:

http://www.winespectator.com/Wine/Spectator/_notes|5527293926834323221480431354?Xv11=&Xr5=&Xv1=&type-region-search-code=&Xa14=flora+springs&Xv4=.

If you construct a CHAR field large enough to hold that URL, you will be wasting a significant amount of space for most every other URL being stored. Variable-length fields let you define a field length that can store the odd, long-length value while not wasting all that space for the common, short-length values. MySQL and mSQL each take separate approaches to this problem.

6.3.2.1. Variable-length character fields in MySQL

If you are using only mSQL, you can skip this section. The advantage of variable-length text fields under MySQL is that such fields use precisely the minimum storage space required to store an individual field. A VARCHAR(255) column that holds the string "hello world," for example, only takes up 12 bytes (one byte for each character plus an extra byte to store the length).

NOTE

In opposition to the ANSI standard, VARCHAR in MySQL fields are not padded. Any extra spaces are removed from a value before it is stored.

You cannot store strings whose lengths are greater than the field length you have specified. With a VARCHAR(4) field, you can store at most a string with 4 characters. If you attempt to store the string "happy birthday," MySQL will truncate the string to "happ." The downside of the MySQL approach to variable-length text fields over the mSQL approach is that there is no way to store the odd string that exceeds your designated field size. Table 6-1 shows the storage space required to store the 144 character Wine Spectator URL shown above along with an average-sized 30 character URL.

Table 6-1. The Storage Space Required by the Different MySQL Character Types

Datatype

Storage for a 144 Character String

Storage for a 30 Character String

Maximum String Size

CHAR(150)

150

150

255

VARCHAR(150)

145

31

255

TINYTEXT(150)

145

31

255

TEXT(150)

146

32

65535

MEDIUMTEXT(150)

147

33

16777215

LONGTEXT(150)

148

34

4294967295

If, after years of uptime with your database, you find that the world has changed and a field that once comfortably existed as a VARCHAR(25) now must be able to hold strings as long as 30 characters, you are not out of luck. MySQL provides a command called ALTER TABLE that enables you to redefine a field type without losing any data.

ALTER TABLE mytable MODIFY mycolumn LONGTEXT

6.3.2.2. Variable-length character fields in mSQL

You can skip this section if you are only interested in MySQL. Variable-length character fields in mSQL enable you to define a field's length to be the size of the average character string length it will hold. While every value you insert into this field will still take up at least the amount you specify, it can hold more. The database does this by creating an overflow table to hold the extra data. The downside of this approach comes in the form of performance and the inability to index variable-length fields.

Let's take a moment to examine the impact of different design choices with mSQL. In order to store all of the above URLs in a CHAR field, we would need to have a CHAR(144) column. Under this scenario, the four URLs in question would take up 576 bytes (144x3), even though you are only actually storing 216 bytes of data. The other 360 bytes is simply wasted space. If you multiple that times thousands or millions of rows, you can easily see how this becomes a serious problem. Using a variable-length TEXT(30) field, however, only 234 bytes (30x3+144) are required to store the 216 bytes of data. Only 18 bytes are wasted. That is a 41% savings!



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.