Issues in Writing Data to Databases (Web Database Applications with PHP & MySQL)

$Web Database Applications with PHP \& MySQL$

6.2. Issues in Writing Data to Databases

In this section, we discuss issues that emerge in database applications when multiple users access a database system; some users are inserting, updating, or deleting data, while others run queries.

To motivate the problems and solutions discussed here, consider an example. Imagine a user of the winestore wants to buy the last bottle of an expensive, rare wine that's in stock. The user browses the database and finds the wine. There is only bottle left, and the user quickly adds this to her shopping cart. The shopping cart is a row in the order table with only one related row in the items table. Now, the user decides to finalize the purchase and is presented with a summary of the shopping cart.

However, while the user fumbles about finding her password to log in, another user enters the system. This user quickly locates the same wine, sees that there is only one bottle left, adds it to his shopping cart, logs in to the system, and purchases the wine. When our first user finally logs in to finalize the order, all the details look fine, but the wine has actually been sold. Our database UPDATE operation to deduct from the inventory fails since the stock value is already zero, and we end up reporting an error to our original—now very unhappy and confused—user.

Consider another example. Imagine that one of our winestore stock managers wants to order 12 more bottles of a popular wine, but only if there are less than two dozen bottles currently in stock. The manager runs a query to sum the total stock for that wine from the inventory table. The result is that there are 15 bottles left, so the manager decides to place an order. However, he heads off to fill his coffee cup first, leaving the system displaying the query result.

A second stock manager arrives at her desk with the same intention: to order more of this popular wine if there are less than 24 bottles. The result of the query is the same: 15 bottles. The second manager orders a dozen bottles, and updates the inventory to 27, knowing the bottles will arrive in the afternoon. The problem occurs when the first manager returns: he doesn't rerun the query—why should he?—and he too orders 12 bottles and updates the inventory to 27. Now the system has record of 27 bottles, but two dozen will arrive in the afternoon to take the total to 39!

The first problem is a design issue—as well as an example of an unrepeatable read—and one that can be solved with more restrictive system requirements, knowledge of how the DBMS behaves, and some careful script development. The second is a classic problem—what textbooks describe as a lost update—and it requires more understanding of relational database problems and theory. We cover simple solutions to fundamental problems like these here, and discuss how MySQL implements locking for transactions, concurrency, and performance.

This section isn't intended as a substitute for a relational database text. Most textbooks contain extensive treatment of transaction and concurrency topics, and most of these are highly relevant to the state problems of web database applications.

6.2.1. Transactions and Concurrency

We have illustrated two examples of the problems users have when they access a web database at the same time (that is, concurrently). Allowing uncontrolled interleaving of SQL statements—where each of the users is reading and writing—can result in several well-known problems. The management of a group of SQL statements—we call these transactions—is one important area of the theory and practice of relational databases. Here are four of the more common problems of concurrent read and write transactions:

Lost update problem

Fortunately, most of these problems can be solved through locking or careful design of scripts that carry out database transactions. However, some problems may be deliberately unsolved in a particular system because they restrict the system requirements or add unnecessary complexity. We discuss locking in the next section.

6.2.2. Locking for Concurrency in MySQL

It has been shown that a simple scheme called locking—actually, two-phase locking—solves the four transaction problems identified in the last section.

6.2.2.1. When and how to lock tables

The first and most important point is that the primary use of locking is to solve concurrency problems. If scripts are being implemented that write to the database but aren't multistep operations susceptible to the problems described in the last section, locks aren't needed. Simple scripts that insert one row, delete one row, or update one row, and that don't use results of a previous SELECT or data entered by the user as input, don't require a lock.

TIP: Locking is required only when developing scripts that first read a value from a database and later write that value to the database. Locks are never required for self-contained insert, update, or delete operations such as updating a customer's details, adding a region to the region table, or unconditionally deleting an inventory. Locking may not be required for all parts of a web database application: parts of the application can still be safely used without violating any locking conditions.

Locks are variables with a special property. With its default settings, each MySQL table has an associated lock variable. If a user sets the lock variable for a particular table, no other user can perform particular actions on that table. The user who has set the lock variable holds the lock on the table. In practice, there are two kinds of locks for each table: READ LOCKs, when a user is only reading from a table, and WRITE LOCK s, when a user is both reading and writing to a table.

Having locks in a DBMS leads to four rules of use:

If a user wants to write to a table, and she is performing a transaction susceptible to a concurrency problem, she must obtain a WRITE LOCK on that table.
If a user only wants to read from a table, and he is performing a transaction susceptible to a concurrency problem, he must obtain a READ LOCK on that table.
If a user requires a lock, she must lock all tables used in the transaction.
A user must release all locks when a database transaction is complete.

When a user holds a WRITE LOCK on a table, no other users can read or write to that table. When a user holds a READ LOCK on a table, other users can also read or hold a READ LOCK, but no user can write or hold a WRITE LOCK on that table.

TIP: SELECT, UPDATE, INSERT, or DELETE operations that don't use LOCK TABLES don't proceed if locks are held that would logically prevent their operation. For example, if a user holds a WRITE LOCK on a table, no other user can issue a SELECT, UPDATE, INSERT, DELETE, or LOCK operation on that table.

The following segment of an interaction with the MySQL command interpreter illustrates the use of locks in a summarization task that requires locking:

mysql> LOCK TABLES items READ, temp_report WRITE;
mysql> SELECT sum(price) FROM items WHERE cust_id=1;
+------------+
| sum(price) |
+------------+
|     438.65 |
+------------+
1 row in set (0.04 sec)

mysql> UPDATE temp_report SET purchases=438.65
       WHERE cust_id=1;
mysql> UNLOCK TABLES;

In this example, a temporary table called temp_report is updated with the result of a SELECT operation on the items table. If locks aren't used, the items table can be modified by another user, possibly altering the summary value of $438.65 used as input to the UPDATE operation. There are two locks obtained for this transaction: first, a READ LOCK on items, since we don't need to change items but we don't want another user to make a change to it; and, second, a WRITE LOCK on temp_report, because we want to change the table, and we don't want other users to read or write to the report while we make changes. The UNLOCK TABLES operation releases all locks held; locks can't be progressively released.

It isn't permitted by MySQL to lock only one of the two tables used in the transaction above. The following rules apply to locks:

If a lock is held, all other tables that are to be used must also be locked. Failing to do so results in a MySQL error.
If aliases are used in queries—for example:
```
SELECT * from customer c where c.custid=1
```
the alias must be locked with:
```
LOCK TABLES customer c READ 
```
or:
```
LOCK TABLES customer c WRITE 
```
(depending on the transaction requirements).
If different aliases for the same table are used, each different alias must be locked.

In many cases—including those in which locking is required if the tasks are implemented intuitively—locking can be avoided. When designing transactions, careful use of mysql_insert_id( ) (as opposed to using max( ) to find the next available identifier), use of temporary summary tables, and updates that are relative (such as UPDATE customer SET discount=discount*1.1) are practical techniques to avoid using the output of previous SELECT statements.

6.2.2.2. The LOCK TABLES and UNLOCK TABLES statements in MySQL

The LOCK TABLES statement is used to lock the listed tables in either READ or WRITE mode. As discussed earlier, all tables that are accessed in the transaction must be locked in either READ or WRITE mode and must be listed in a single LOCK TABLES statement.

A script that issues a LOCK TABLES statement is suspended until all locks listed are successfully obtained. There is no time limit in waiting for locks. For locks that can't be immediately obtained—because the lock is held by another user or an operation is running on the table already—the request is placed at the back of either the write- or read-lock queue for the table, depending on the lock required. The write-lock queue has priority over the read-lock queue, so a user who wants a write lock obtains it when it becomes available, regardless of how long another user has been waiting in the read-lock queue. This is a design decision in MySQL that gives priority to database modifications over database queries.

NOTE: MySQL is designed to give writing priority over reading. Regardless of how long a user has been queued in the READ LOCK queue, any request in the WRITE LOCK queue receives priority. This can lead to a problem called starvation, where a transaction never completes because it can't obtain the required locks. However, since most web database applications read from databases much more than they write, and locks are required in only a few situations, starvation is very uncommon in practice.

If low-priority writing is essential to an application, a LOW_PRIORITY option can be prefixed before the WRITE clause. If a transaction is queued for a LOW_PRIORITY WRITE, it receives the lock only when the READ LOCK queue is empty and no other users are reading from the table. Again, consideration of possible starvation is important.

Locks can't be progressively obtained through several LOCK TABLES statements. Indeed, issuing a second LOCK TABLES is the same as issuing an UNLOCK TABLES to release all locks and then issuing the second LOCK TABLES. There are good reasons for the strictness of this related to a well-known locking problem called deadlock, which we don't discuss here. However, MySQL is deadlock-free because of the strictness and functionality of the LOCK TABLES and UNLOCK TABLES statements.[9]

[9]Deadlock is a difficult problem. As recently as Version 3.22.23 of MySQL, there were bug fixes to MySQL to avoid deadlocking problems in the DBMS.

WARNING: MySQL has a feature called INSERT DELAYED for insertion that is described in the MySQL manual.

Don't mix locking with INSERT DELAYED for insertion operations. The INSERT DELAYED process is carried out by the MySQL DBMS at a later time—under its own control—and the locks held by the user can't be used by the DBMS. INSERT DELAYED should be used only in situations in which locking isn't required.

6.2.2.3. Locking for performance

Locking is primarily designed to ensure that concurrent transactions can execute safely. However, locking is also a useful performance tool to optimize the performance of a particular transaction.

Consider, for example, a situation where we urgently require a complex report on the stock in the winestore that uses a slow SELECT statement with a GROUP BY, ORDER BY, and HAVING clause, and that joins together many tables. With other users running queries and using system resources, this query may run even slower. A solution is to use LOCK TABLES with the WRITE option to stop other users running queries or database updates, and to have exclusive access to the database for the query duration. This permits better optimization of the query processing by the DBMS, dedication of all the system resources to the query, and faster disk access.

The downside of locking for performance is the reduction in concurrent access to the database. Users may be inconvenienced by slow responses or timeouts from the web database application. Locking for performance should be used sparingly.

6.2.3. Locking Tables in Web Database Applications

Example 6-9 shows a PHP function, updateDiscount( ), that requires locking to ensure that the value returned from the SELECT query can't change before the UPDATE operation. The script is designed to be run either by the winestore system administrator—it would then require a <form> for user input—or as the final module in the ordering process for users. Another example that requires locking for winestore ordering is included in Chapter 12.

The script in Example 6-9 is designed to reward loyal customers. If the customer has spent a significant amount on an order at the winestore, a percentage discount is applied to her order. The function updateDiscount( ) forms the body of the script. It takes as parameters a cust_id, an order_id for that customer, a discount to apply to that order, and a threshold total. If the total amount spent by the user exceeds the threshold total, the discount is applied to the order.

Example 6-9. The updateDiscount function in which locking is required

  function updateDiscount($custId, $orderId,
                          $discount, $minimum,
                          $connection)
  {
     $ok = false;

     // Lock all tables required in this transaction
     $query = "LOCK TABLES items READ,
               orders WRITE, customer READ";

     if (!mysql_query($query, $connection))
        showerror( );

     // Run query to find out how much a user
     // has spent in this purchase
     $query = "SELECT SUM(price*qty)
               FROM items, orders, customer
               WHERE customer.cust_id =
                     orders.cust_id
               AND orders.order_id = items.order_id
               AND items.cust_id = orders.cust_id
               AND orders.order_id = $orderId
               AND customer.cust_id = $custId";

     if (!($result = mysql_query($query, $connection)))
        showerror( );

     // Get the $row with the total spent
     $row = mysql_fetch_array($result);

     // Is the amount spent more than the threshold?
     if ($row["SUM(price*qty)"] > $minimum)
     {
        // Yes, so give the customer a discount
        // for this order
        $query = "UPDATE orders
                  SET discount = $discount
                  WHERE cust_id = $custId
                  AND order_id = $orderId";

        if (!mysql_query($query, $connection))
            showerror( );

        $ok = true;
     }

     // Unlock the tables
     $query = "UNLOCK TABLES";

     if (!mysql_query($query, $connection))
        showerror( );

     // Return whether the discount was given or not
     return $ok;
  }

The locking of items, orders, and customer is performed before the query, and the UNLOCK TABLES statement is issued after the database update of the discount. As discussed in the last section, all tables and aliases that are used must be locked for either READ or WRITE. MySQL reports an error if, for example, items is accessed but not locked while orders and customer were locked. If an unlocked table needs to be accessed—or locking must be avoided for a particular table—a second DBMS connection can be opened and used.

6.2.3.1. Locking methods that don't work in web database applications

There are several locking paradigms that don't work in a web database application because of the statelessness of HTTP. Each approach fails, because there is either no guarantee or no possibility that the locked tables will be unlocked. If tables are locked indefinitely, other transactions can't proceed, and the DBMS will most likely need to be shut down and restarted.

WARNING: Be careful with locking in web database applications. Remember the basic rule that all locks should be unlocked by the same script during the same execution of the script.

All web scripts that require locking should have the structure lock, query, update, and unlock. There must be no user interaction or intervening calls to other scripts that require input.

The following approaches to transactions and locking in a web database application should be avoided:

Failing to issue an UNLOCK TABLES on a locked persistent database connection opened with mysql_pconnect( ). The locks aren't released when the script terminates, and there is no guarantee that the script will be run in the future or that the same persistent connection will be used again.

It isn't necessary to issue an UNLOCK TABLES if a nonpersistent connection opened with mysql_connect( ) is used. Locks are automatically released when the script finishes. However, it is good practice to include the UNLOCK TABLES statement.
Locking one or more tables during the first execution of a script, then querying or updating during a second or subsequent execution of the script. Remember that each database connection in a script is independent and is treated as a different user by MySQL. Such an approach queries and updates without locks unless, by chance, the same persistent connection that issued the locks is reused. A subsequent UNLOCK TABLES may fail.
Retrieving a value such as the next available primary key value, presenting this to the user, waiting for the user to enter further details, and then adding a row to the database with that identifier. Remember that another user may add a row while the first user is entering the required details, and locks should never be carried across several scripts or different executions of the same script.

6.2.3.2. Locking with an auxiliary table

If values must be shown to a user, consider adding a summary table for identifiers, or copying rows to a temporary table. For example, an identifier table can store the next available identifier for each other table, this can then be incremented by the script and the value can be used in subsequent scripts without locking problems and without any clashes in numbering.

This solution is shown in Example 6-10, using an auxiliary table named ids that manages the next available region_id attribute. The use of the additional table prevents duplicate rows being inserted, and avoids any problems with locking or updates.

Example 6-10. An auxiliary table manages the next region_id attribute

<?php
  // This code needs an auxiliary table called "ids"
  // that might be created with:
  // CREATE TABLE ids (
  // region_id int default 0,
  // other_id int default 0,
  // another_id int default 0
  // );
  // It has one row, and no primary key is required.
  // After creating the table, a row is needed,
  // so issue an: INSERT INTO ids (NULL, NULL, NULL);
  // (if it's being added later, use MAX( ) to get the
  //  correct ID values!)

  include 'db.inc';
  include 'error.inc';

  function getNextRegion ($connection)
  {
     // A nice way to do it... use an auxiliary table
     // Lock the auxiliary table
     $query = "LOCK TABLES ids WRITE";

     if (!mysql_query($query, $connection))
        showerror( );

     // Add one to the region_id attribute
     $query = "UPDATE ids SET region_id = region_id + 1";

     if (!mysql_query($query, $connection))
        showerror( );

     // Find out the new value of region_id
     $query = "SELECT * FROM ids";

     if (!($result = mysql_query($query, $connection)))
        showerror( );

     // Get the row that is returned
     $row = mysql_fetch_array($result);

     // Unlock the table
     $query = "UNLOCK TABLES";

     if (!mysql_query($query, $connection))
        showerror( );

     // Return the region_id
     return ($row["region_id"]);
  }
  // MAIN -----
  if (!($connection = @ mysql_connect($hostName,
                                    $username,
                                    $password)))
     die("Could not connect to database");

  if (!mysql_select_db($databaseName))
     showerror( );

  if (empty($regionId))
  {
     $regionId =
        getNextRegion($connection, $databaseName);
?>
<!DOCTYPE HTML PUBLIC
   "-//W3C//DTD HTML 4.0 Transitional//EN"
   "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
  <title>Insert a region</title>
</head>
<body bgcolor="white">
region_id: <?= $regionId ?>
<br>
<form method="post" action="example.6-10.php">
  <input type="hidden"
   name="regionId" value="<?=$regionId;?>">

  <br>region_name:
  <br><input type="text" name="regionName" size=80>
  <br>description:
  <br><textarea name="description" rows=4 cols=80>
      </textarea>
  <br><input type="submit">
</form>
</body>
</html>
<?php
  }
  else
  {
     $regionId = clean($regionId, 3);
     $regionName = clean($regionName, 20);
     $description = clean($description, 255);

     $query = "INSERT INTO region SET " .
       "region_id = " . $regionId . ", " .
       "region_name = \"" . $regionName . "\", " .
       "description = \"" . $description . "\"";

     if ((@ mysql_query ($query, $connection))
             && @ mysql_affected_rows( ) == 1)
        header("Location:insert_receipt.php?" .
               "values=$regionId&status=T");
     else
        header("Location: insert_receipt.php?status=F");
  }
?>

6.2.3.3. The table-level locking paradigm in MySQL

Until recently, MySQL supported only table locking. Other DBMSs support locking at other levels, including locking rows, groups of rows, attributes across all rows in a table, and disk pages.

A common argument against using MySQL has been that table locking is too heavy-handed and that it limits concurrency in web database applications. This isn't really true, except when there are specific requirements that are uncharacteristic of web database applications.

Table locking works particularly well in web database applications, where typically:

DELETE and UPDATE operations are on specific rows—most often accessed by the primary key value—and the rows are accessed through an index.
There are many more read operations than write operations.
Operations require locks on whole tables. Examples include GROUP BY operations, updates of sets of rows, and reading in most rows in a table.

By default, MySQL uses a type of table called MyISAM. Up to now, the MyISAM and heap have supported only table locking. However, three new database types have recently become supported by MySQL, and these have different locking paradigms:

The Berkeley Database (BDB) tables have disk page-level locking; the LOCK TABLES statement can still be used in BDB.
The InnoDB tables have row-level locking. They are designed to support very large volumes of data efficiently, and the locking mechanisms are designed to have low overheads.
The Gemini tables have both row- and table-level locking; unlike the other table types that can be used with MySQL, the Gemini table is covered by a commercial license and isn't free software.

Support for BDB and InnoDB tables must be compiled into MySQL during the installation process, and each requires MySQL 3.23.34 or a later version. The Gemini table type is a component of the commercially available NuSphere product range. Configuration of these table types is outside the scope of this book.

Interestingly, the MyISAM tables permit a limited form of concurrency that isn't immediately obvious with the table-locking paradigm. When a mix of select and write operations occur on a MyISAM table, MySQL automatically allows write operations to change copies of the data. Other SELECT statements being run by other users read the unchanged data and, when they are completed, the modified copies are written back to the database. This approach is known as data versioning.

6.2.3.4. Other locking paradigms

The row-locking paradigm is used in the InnoDB and Gemini table types, and is the dominant paradigm in other DBMSs. The BDB table type offers page locking, which is similar to locking selected rows.

Row or page locking works well in situations that are infrequently seen in web database applications, such as:

Transaction environments where a number of steps need to be undone or rolled back.
Many users are writing to the same tables concurrently.
Locks need to be maintained for long periods of time.

The drawbacks of row and page locking include:

Higher memory requirements to manage an increased number of locks

Poor performance, since there is much more locking and unlocking activity

Slow locking for operations that require locks on a whole table, such as GROUP BY operations

6.2.3.5. What isn't covered here

There are two significant topics related to transactions and concurrency that aren't covered in this chapter. We have omitted these topics because they are less important in web database applications than in traditional relational systems, and because this book isn't intended as a substitute for a good relational database text or the documentation of the MySQL DBMS.

The first is a more traditional treatment of transactions from a commit and rollback perspective. The InnoDB, BDB, and Gemini table types support a model where a statement can be issued to begin a transaction that consists of several database operations. On completion of the operations, a commit statement can be issued to write the changes to the database and verify that these changes have occurred. If, for some reason, the operations need to be undone—for example, when a user presses Cancel—a rollback command can be issued to return the database to its original state.

Commit and rollback processing is useful, but it can be argued that it is less interesting in the stateless HTTP environment, in which operations need to be as independent as possible. For most practical purposes in web database applications, complex transactional processing isn't required.

The second topic we have not covered is recovery. Database recovery techniques are based on logging, in which database changes are written to a file that can be used for transaction rollback and for DBMS system recovery. MySQL does support logging for recovery, and more details can be found in the MySQL manual.