home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeManaging and Using MySQLSearch this book

3.9. Advanced Features

Using the SQL presented thus far in this chapter should handle 90% of your database programming needs. On occasion, however, you will need some extra power not available in the basic SQL functionality. We close out the chapter with a discussion of a few of these features.

3.9.1. Full Text Searching

MySQL introduced the ability to search on text elements within a text field in Version 3.23.23 through a special index called a FULLTEXT index. It specifically enables you to do something like:

INSERT INTO Document (url, page_text )
VALUES ('index.html', 'The page contents.');
SELECT url FROM Document WHERE MATCH ( page_text ) AGAINST ('page');

INSERT adds a row to a Document table containing the URL of a web page and its text content. SELECT then looks for the URLs of all documents with the word page embedded in their text.

3.9.1.2. Relevance values

The search that occurs here is a natural language search against the text in the specified columns. It is case insensitive. The result of the match is actually a relevance value that MySQL uses to rank the results. By default, MySQL shows the results with the most relevant results listed first and eliminates only those results with no relevance at all.

mysql> SELECT url FROM Document 
    -> WHERE MATCH ( title, page_text ) AGAINST ('java');
+------------+
| url        |
+------------+
| java.html  |
+------------+

You can use the relevance to your advantage. To get a better picture of the relevance values, you can execute the following query:

mysql> SELECT url, WHERE MATCH ( title, page_text ) AGAINST ('java') 
    -> FROM Document;
+------------+---------------------------------------------+
| url        | MATCH ( title, page_text ) AGAINST ('java') |
+------------+---------------------------------------------+
| index.html |                                           0 |
| java.html  |                             1.8016148151891 |
| perl.html  |                                           0 |
| c.html     |                                           0 |
+------------+---------------------------------------------+

In this case, the index.html file is a web page about MySQL and java.html is about how MySQL and Java work together. As you might expect, the results show that index.html has no relevance to Java, while java.html has quite a bite of relevance.

TIP: In the above example, you can include MATCH in both the SELECT clause and the WHERE clause without incurring any extra overhead. MySQL is smart enough to notice that the two matches are identical and thus execute them a single time. Using MATCH in both places is useful when you want to make use of relevance.

You might expect that a match against MySQL would turn up high relevance for both documents. In reality, however, it turns up zero relevance. Because the phrase MySQL is present in more than half the rows, it is considered a stopword and thus discounted. A stopword is simply a word with no value for text matching. Common stopwords are "the" and "but." They are critical to achieving meaningful results in a full text search. Unfortunately, our sample database has four rows about MySQL and thus MySQL itself is considered a stopword.

3.9.1.3. Boolean mode

MySQL 4.0.1 introduced the ability to perform more complex searches with MySQL full text searching using much the same syntax you would use in Internet search engines. These complex searches are called Boolean mode searches. To execute a Boolean mode search, you use the same syntax except:

For example:

SELECT url, title FROM Document
WHERE MATCH ( title, page_text ) AGAINST ( '+MySQL -Java' IN BOOLEAN MODE );

This query enables you to look for all documents that include MySQL but exclude Java. Common words that might otherwise be used as stopwords can be used in Boolean mode searches. Without any modifiers, a term in a Boolean mode search is considered optional. You can modify your Boolean mode searches with the following operators:

+
The word must be in the matched index.

-
The word must not be in the matched index.

~
The word's relevance is inverted. This operator is useful for removing words that MySQL is giving high relevance to, but you do not want in your hits. Unlike the - operator, returned rows can have the term in question.

<
Decreases the word's contribution to the overall relevance of a match.

>
Increases a word's contribution to the overall relevance of a match.

*
This operator comes after the word it modifies. It works much like a wildcard, matching any words that start with the specified word.

()
Groups words into subexpressions.

Now you can run complex queries such as:

SELECT url, title FROM Document
WHERE MATCH ( title, page_text ) AGAINST ( '+MySQL -optimiz* +(>Java <Perl)' IN BOOLEAN MODE);

This query asks for all of the rows about MySQL and either Java or Perl that do not have words beginning with optimiz. It will then rank Java documents higher than Perl documents.

3.9.2. Transactions

MySQL recently introduced transactions along with SQL for executing statements in a transactional context. By default, MySQL is in a state called autocommit. Autocommit mode means that any SQL you send to MySQL is executed immediately. In some cases, however, you may want to execute two or more SQL statements together as a single unit of work.

A transfer between two bank accounts is the perfect example of such a transaction. The banking system needs to make sure that the debit from the first account and the credit to the second account occur as a single unit of work. If they are treated separately, the server could in theory crash between the debit and the credit. The result would be that you would lose that money!

By making sure the two statements occur as a single unit of work, transactions ensure that the first statement can be "rolled back" in the event that the second statement fails. To use transactions in MySQL, you first need to create a table using a transactional table type such as BDB or InnoDB.[10]

[10]Not all platforms support these table types.

If your MySQL install was not compiled with support for these table types, you cannot use transactions unless you reinstall. The SQL to create a transactional table specifies one of the transactional types:

CREATE TABLE ACCOUNT (
    ACCOUNT_ID BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    BALANCE    DOUBLE) 
TYPE = BDB;

For a transaction against a transactional table to work, you need to turn off autocommit. You can do this with the command:

SET AUTOCOMMIT=0;

Now you are ready to begin using MySQL transactions. Transactions start with the BEGIN command:

BEGIN;

Your mysql client is now in a transactional context with respect to the server. Any change you make to a transactional table will not be made permanent until you commit it. Changes to nontransactional tables, however, will take place immediately. In the case of the account transfer, we issue the following statements:

UPDATE ACCOUNT SET BALANCE = 50.25 WHERE ACCOUNT_ID = 1;
UPDATE ACCOUNT SET BALANCE = 100.25 WHERE ACCOUNT_ID = 2;

Once you're done with any changes, complete the transaction by using the COMMIT command:

COMMIT;

The true advantage of transactions, of course, comes into play should an error occur in executing the second statement. To abort the entire transaction before a commit, issue the ROLLBACK command:

ROLLBACK;

In reality, the logic behind such complex transactional operations, including commits and rollbacks, requires solid design and well-structured error handling. We will cover these programmatic elements of transaction management in Chapter 8.

Of course, it would be useful if MySQL performed the actual math. It can do just that as long as you store the values you want with a SELECT call:

SELECT @FIRST := BALANCE FROM ACCOUNT WHERE ACCOUNT_ID = 1;
SELECT @SECOND := BALANCE FROM ACCOUNT WHERE ACCOUNT_ID = 2;
UPDATE ACCOUNT SET BALANCE = @FIRST - 25.00 WHERE ACCOUNT_ID = 1;
UPDATE ACCOUNT SET BALANCE = @SECOND + 25.00 WHERE ACCOUNT_ID = 2;

In addition to the COMMIT command, a handful of other commands will automatically end any current transaction as if a COMMIT had been issued. These commands are:

Chapter 8 covers some of the more intricate details of using transactions in database applications.

3.9.3. Table Locking

Table locking is the poor man's transaction. In short, MySQL lets you lock down a group of tables so that only a single client can use it. Unlike transactions, you are not limited by the type of table. You cannot, however, roll back any actions taken against a locked table.

Locking has two basic functions:

  • Enables multiple statements to execute against a group of tables as one unit of work

  • Enables multiple updates to occur faster under some circumstances

MySQL supports three kinds of locks: read, read local, and write. Both kinds of read locks lock the table for reading by a client and all other clients. As long as the lock is in place, no one can write to the locked tables. Read and read local locks differ in that read local allows a client to execute nonconflicting INSERT statements as long as no changes to the MySQL files from outside of MySQL occur while the lock is held. If changes might occur by agents outside of MySQL, a read lock is required.

A write lock locks the specified tables against all access—read or write—by any other client. To lock a table, use the following command:

LOCK TABLES ACCOUNT WRITE;

Now that the ACCOUNT table is locked, you can read from it and modify the data behind it and be certain that no one else will change the data you read between your read and write operations:

SELECT @BAL:=BALANCE FROM ACCOUNT WHERE ACCOUNT_ID = 1;
UPDATE ACCOUNT SET BALANCE = @BAL * 0.03 WHERE ACCOUNT_ID = 1;

Finally, you need to release the locks:

UNLOCK TABLES;

It is really important that you unlock the tables! Failing to do so can result in preventing further access to those tables. Finally, any table locking should be short lived. Long-lived locks seriously degrade database performance.

3.9.4. Functions

Functions in SQL are similar to functions in other programming languages such as C and Perl. The function takes zero or more arguments and returns some value. For example, the function SQRT(16) returns 4. Within a MySQL SELECT statement, functions may be used in one of two ways:

As a value to be retrieved
This form involves a function in the place of a column in the list of columns to be retrieved. The return value of the function, evaluated for each selected row, is part of the returned result set as if it were a column in the database.[11]

[11]You can use aliasing, covered earlier in the chapter, to give the resulting columns "friendly" names.

For example:

SELECT name, FROM_UnixTIME(date) 
FROM events
WHERE time > 90534323

This query selects the name of each event and the date of the event formatted in human-readable form for all events more recent than the given time. FROM_UnixTIME( ) transforms a standard Unix time value into a human-readable form.[12]

[12]Remember that SQL is case insensitive. This particular function is simply written FROM_UnixTIME( ) by convention. You can use FROM_UNIXTIME() or From_UnixTime( ) if they feel more natural to you.

# The LENGTH( ) function returns the character length of
# a given string.
SELECT title, text, LENGTH(text)
FROM papers
WHERE author = 'Stacie Sheldon'

This query selects the title of a paper, the full text of the paper, and the length of the text in bytes for all of the papers authored by Stacie Sheldon. The LENGTH( ) function returns the character length of a given string.

As part of a WHERE clause
This form involves a function used in place of a constant when evaluating a WHERE clause. The value of the function is used for comparison for each row of the table. For example:

SELECT name 
FROM entries 
WHERE id = ROUND( (RAND( )*34) + 1 )

This query randomly selects the name of an entry from a pool of 35 entries. The RAND( ) function generates a random number between 0 and 1. This random value is then multiplied by 34 to turn the value into a number between 0 and 34. Incrementing the value by 1 provides a number between 1 and 35. The ROUND( ) function rounds the result to the nearest integer. The result is a whole number between 1 and 35 and will therefore match one of the ID numbers in the table.

SELECT name, FROM_UnixTIME(date)
FROM events
WHERE time > (Unix_TIMESTAMP( ) - (60 * 60 * 24) )

You may use functions in both the value list and the WHERE clause. This query selects the name and date of each event less than a day old. With no arguments, the UNIX_TIMESTAMP( ) function returns the current time in Unix format.

SELECT name
FROM people
WHERE password = ENCRYPT(name, LEFT(name, 2))

You may also use the value of a table field within a function. This example returns the names of people who used their names as passwords. The ENCRYPT( ) function returns a Unix password-style encryption of the specified string using the supplied two-character salt. The LEFT( ) function returns the left-most n characters of the specified string.

3.9.4.1. Date functions

The most common functions you use will likely be the MySQL functions that enable you to manipulate dates. You already saw some of these functions earlier for translating a Unix-style date into a human-readable form of the date. MySQL, of course, provides more powerful functions for doing things such as calculating the time between two dates:

SELECT TO_DAYS(NOW( )) - TO_DAYS('2000-12-31');

This example provides the number of days that have passed in this millennium. The NOW( ) function returns the DATETIME representing the moment in time when the command was executed. Less obviously, the TO_DAYS( ) function returns the number of days since the year 1 B.C., represented by the specified DATE or DATETIME.[13]

[13]MySQL is actually incapable of representing this date. Valid date ranges in MySQL are from January 1, 1000, to December 31, 9999. There is also no support in MySQL for alternative calendar systems such as the Hebrew, Chinese, or Muslim calendars.

Not everyone likes to see dates formatted the way MySQL provides them by default. Fortunately, MySQL lets you format dates to your own liking using the DATE_FORMAT function. It takes a DATE or DATETIME and a format string indicating how you want the date formatted:

mysql> SELECT DATE_FORMAT('1969-02-17', '%W, %M %D, %Y');
+--------------------------------------------+
| DATE_FORMAT('1969-02-17', '%W, %M %D, %Y') |
+--------------------------------------------+
| Monday, February 17th, 1969                |
+--------------------------------------------+
1 row in set (0.39 sec)

Chapter 15 contains a full list of valid tokens for the DATE_FORMAT( ) function.

3.9.5. Outer Joins

MySQL supports a more powerful joining than the simple inner joins we saw earlier. Specifically, MySQL supports something called a left outer join (also known as simply an outer join), which you specify with the keywords LEFT JOIN. This type of join is similar to an inner join, except that it includes data in the first column named that does not match any in the second column. If you remember our author and book tables from earlier in the chapter, you will remember that our join would not list any authors who did not have books in our database. You may want to show entries from one table that have no corresponding data in the table to which you are joining. That is where an outer join comes into play:

SELECT book.title, author.name 
FROM author
LEFT JOIN book ON book.author = author.id

This query is similar to the inner join that you already understand:

SELECT book.title, author.name
FROM author, book
WHERE book.author = author.id

Note that an outer join uses the keyword ON instead of WHERE. The key difference in results is that the new syntax of the outer join will include authors such as Neil Gaiman, for whom no book is in our database. The results of the outer join would therefore look like this:

+----------------+----------------+
| book.title     | author.name    |
+----------------+----------------+
| The Green Mile | Stephen King   |
| Guards, Guards!| Terry Pratchett|
| Imzadi         | Peter David    |
| Gold           | Isaac Asimov   |
| Howling Mad    | Peter David    |
| NULL           | Neil Gaiman    |
+----------------+----------------+

MySQL takes this concept one step further by using a natural outer join. A natural outer join will combine the rows from two tables that have identical column names with identical types and identical values:

SELECT my_prod.name 
FROM my_prod
NATURAL LEFT JOIN their_prod

This natural join will list all product names with identical entries in the my_prod and their_prod tables.

3.9.7. Batch Processing

Batch loading is the act of loading a lot of data into or pulling a lot of data out of MySQL all at once. MySQL supports two types of batch loading.

3.9.7.2. The LOAD command

The LOAD command enables you to load data from a file containing only data (no SQL commands). For example, if you had a file containing the names of all the books in your collection with one book on each line and the title and author separated by a tab, you could use the following command to load that data into your book table:

LOAD DATA LOCAL INFILE 'books.dat' INTO TABLE BOOK;

This command assumes that the file books.dat has one line for each database record to be inserted. It further assumes that there is a value for every column in the table or \N for null values. So, if the BOOK table has three columns, each line of books.dat should have three tab-separated values.

The LOCAL keyword tells the mysql command line to look for the file on the same machine as the client.[14] Without it, MySQL looks for the file on the server. Of course, if you are trying to load something on the server, you need to have been granted the special FILE privilege. Finally, keep in mind that nonlocal loads refer to files relative to the installation directory of MySQL.

[14]Reading from files local to the client is available only on MySQL 3.22.15 and later.

If you have a comma-separated value file such as an Excel file, you can change the delimiter of the LOAD command:

LOAD DATA LOCAL INFILE 'books.dat' 
INTO TABLE BOOK 
FIELDS TERMINATED BY ',';

If a file contains values that would cause duplicate records in the database, you can use the REPLACE and IGNORE keywords to dictate the correct behavior. REPLACE will cause the values from the file to replace the ones in the database, where the IGNORE keyword will cause the duplicate values to be ignored. The default behavior is to ignore duplicates.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.