6.3. SQL Datatypes
In a table, each column has a type. As we mentioned earlier, a SQL datatype is similar to a datatype in traditional programming languages. While many languages define a bare-minimum set of types necessary for completeness, SQL goes out of its way to provide types such as MONEY and DATE that will be useful to every day users. You could store a MONEY type in a more basic numeric type, but having a type specifically dedicated to the nuances of money processing helps add to SQL's ease of use -- one of SQL's primary goals.
Table 6-1. The Most Often Used Datatypes Common to Both MySQL and mSQL
Both database engines provide more types than those mentioned above. MySQL, in particular, is very rich in the number of datatypes it supports. In day-to-day programming, however, you will find yourself using mostly the types mentioned earlier. With mSQL, choosing a datatype is pretty much as simple as picking the type that most closely resembles the data you want to store. The size of the data you wish to store, however, plays a much larger role in designing MySQL tables.
6.3.1. Numeric Types
Before you create a table, you should have a good idea of what kind of data you wish to store in the table. Beyond obvious decisions about whether your data is character-based or numeric, you should know the approximate size of the data to be stored. If it is a numeric field, what is its maximum possible value? What is its minimum possible value? Could that change in the future? If the minimum is always positive, you should consider an unsigned type. You should always choose the smallest numeric type that can support your largest conceivable value. If, for example, we had a field that represented the population of a state, we would use an unsigned INT field. No state can have a negative population. Furthermore, in order for an unsigned INT field not to be able to hold a number representing a state's population, that state's population would have to be roughly the population of the entire Earth.
6.3.2. Character Types
Managing character types is a little more complicated. Not only do you have to worry about the minimum and maximum string lengths, but you also have to worry about the average size, the amount of variation likely, and the need for indexing. For our current purposes, an index is a field or combination of fields on which you plan to search -- basically, the fields in your WHERE clause. Indexing is, however, much more complicated than this simplistic description, and we will cover indexing later in the chapter. The important fact to note here is that indexing one character fields works best when the field is fixed length. In fact, mSQL does not even provide an indexible variable-length character field! If there is little -- or, preferably, no -- variation in the length of your character-based fields, then a CHAR type is likely the right answer. An example of a good candidate for a CHAR field is a country code. The ISO provides a comprehensive list of standard two-character representations of country codes (US for the U.S.A., FR for France, etc.). Since these codes are always exactly two characters, a CHAR(2) is always the right answer for this field.
A value does not need to be invariant in its length to be a candidate for a CHAR field. It should, however, have very little variance. Phone numbers, for example, can be stored safely in a CHAR(13) field even though phone number length varies from nation to nation. The variance simply is not that great, so there is no value to making a phone number field variable in length. The important thing to keep in mind with a CHAR field is that no matter how big the actual string being stored is, the field always takes up exactly the number of characters specified as the field's size -- no more, no less. Any difference between the length of the text being stored and the length of the field is made up by padding the value with spaces. While the few potential extra characters being wasted on a subset of the phone number data is not anything to worry about, you do not want to be wasting much more. Variable-length text fields meet this need.
A good, common example of a field that demands a variable-length datatype is a web URL. Most web addresses can fit into a relatively small amount of space -- http://www.ora.com, http://www.hughes.com.au, http://www.mysql.com -- and consequentially do not represent a problem. Occasionally, however, you will run into web addresses like:
If you construct a CHAR field large enough to hold that URL, you will be wasting a significant amount of space for most every other URL being stored. Variable-length fields let you define a field length that can store the odd, long-length value while not wasting all that space for the common, short-length values. MySQL and mSQL each take separate approaches to this problem.
184.108.40.206. Variable-length character fields in MySQL
If you are using only mSQL, you can skip this section. The advantage of variable-length text fields under MySQL is that such fields use precisely the minimum storage space required to store an individual field. A VARCHAR(255) column that holds the string "hello world," for example, only takes up 12 bytes (one byte for each character plus an extra byte to store the length).
You cannot store strings whose lengths are greater than the field length you have specified. With a VARCHAR(4) field, you can store at most a string with 4 characters. If you attempt to store the string "happy birthday," MySQL will truncate the string to "happ." The downside of the MySQL approach to variable-length text fields over the mSQL approach is that there is no way to store the odd string that exceeds your designated field size. Table 6-1 shows the storage space required to store the 144 character Wine Spectator URL shown above along with an average-sized 30 character URL.
Table 6-1. The Storage Space Required by the Different MySQL Character Types
If, after years of uptime with your database, you find that the world has changed and a field that once comfortably existed as a VARCHAR(25) now must be able to hold strings as long as 30 characters, you are not out of luck. MySQL provides a command called ALTER TABLE that enables you to redefine a field type without losing any data.
ALTER TABLE mytable MODIFY mycolumn LONGTEXT
220.127.116.11. Variable-length character fields in mSQL
You can skip this section if you are only interested in MySQL. Variable-length character fields in mSQL enable you to define a field's length to be the size of the average character string length it will hold. While every value you insert into this field will still take up at least the amount you specify, it can hold more. The database does this by creating an overflow table to hold the extra data. The downside of this approach comes in the form of performance and the inability to index variable-length fields.
Let's take a moment to examine the impact of different design choices with mSQL. In order to store all of the above URLs in a CHAR field, we would need to have a CHAR(144) column. Under this scenario, the four URLs in question would take up 576 bytes (144x3), even though you are only actually storing 216 bytes of data. The other 360 bytes is simply wasted space. If you multiple that times thousands or millions of rows, you can easily see how this becomes a serious problem. Using a variable-length TEXT(30) field, however, only 234 bytes (30x3+144) are required to store the 216 bytes of data. Only 18 bytes are wasted. That is a 41% savings!
6.3.3. Binary Datatypes
mSQL has no support for binary data. MySQL, on the other hand, provides a set of binary datatypes that closely mirror their character counterparts. The MySQL binary types are CHAR BINARY, VARCHAR BINARY , TINYBLOB , BLOB , MEDIUMBLOB, and LONGBLOB . The practical distinction between character types and their binary counterparts is the concept of encoding. Binary data is basically just a chunk of data that MySQL makes no effort to interpret. Character data, on the other hand, is assumed to represent textual data from human alphabets. It thus is encoded and sorted based on rules appropriate to the character set in question. Specifically, MySQL sorts binary in a case-insensitive, ASCII order.
6.3.4. Enumerations and Sets
MySQL provides two other special kinds of types with no mSQL analog. The ENUM type allows you specify at table creation a list of possible values that can be inserted into that field. For example, if you had a column named fruit into which you wanted to allow only "apple," "orange," "kiwi," or "banana," you would assign this column the type ENUM:
CREATE TABLE meal(meal_id INT NOT NULL PRIMARY KEY, fruit ENUM(`apple', `orange', `kiwi', `banana'))
When you insert a value into that column, it must be one of the specified fruits. Because MySQL knows ahead of time what valid values are for the column, it can abstract them to some underlying numeric type. In other words, instead of storing "apple" in the column as a string, it stores it as a single byte number. You just use "apple" when you call the table or when you view results from the table.
6.3.5. Other Kinds of Data
Every piece of data you will ever encounter can be stored using numeric or character types. Technically, you could even store numbers as character types. Just because you can do so, however, does not mean that you should do so. Consider, for example, storing money in the database. You could store that as an INT or a REAL. While a REAL might seem more intuitive -- money requires decimal places, after all -- an INT actually makes more sense. With floating point values like REAL fields, it is often impossible to capture a number with a specific decimal value. If, for example, you insert the number 0.43 to represent $0.43, MySQL and mSQL may store that as 0.42999998. This small difference can be problematic when applied to a large number of mathematical operations. By storing the number as an INT and inserting the decimal into the right place, you can be certain that the value represents exactly what you intend it to represent.
Isn't all of that a major pain? Wouldn't it be nice if MySQL and mSQL provided some sort of datatype specifically suited to money values? MySQL and, to a lesser degree, mSQL both provide special datatypes to handle special kinds of data. MONEY is an example of one of these kinds of data. DATE is another. For a full description of all datatypes, see Chapter 17, "MySQL and mSQL Programs and Utilities".
Copyright © 2001 O'Reilly & Associates. All rights reserved.