Transactions, Locking, and Isolation (Programming the Perl DBI)

6.3. Transactions, Locking, and Isolation

The final topic in this chapter deals with the important (and hair-raising!) topic of transaction handling.

Transaction handling is a feature of the more powerful database systems in which SQL statements can be grouped into logical chunks. Each chunk is known as a transaction, and the operations it performs are guaranteed to be atomic for the purposes of recovery. According to the ANSI/ISO SQL standard, a transaction begins with the first executable SQL statement and ends when it is explicitly committed or rolled back.

The process of committing data writes it into the database tables and make it visible to other concurrent users. Rolling back discards any changes that have been made to any tables since the beginning of the current transaction.

The standard example to explain transactions is a bank transfer in which a customer transfers $1000 from one bank account to another. The bank transfer consists of three distinct stages:

Decrease the source account by the required amount.
Increase the target account by the required amount.
Write a journal entry recording the transfer.

When viewed as three separate stages, the possibility of disaster is quite obvious. Suppose there's a power outage between stages 1 and 2. The hapless customer is now $1000 poorer, as the money has neither reached the target account nor been logged in the transfer journal. The bank is now $1000 richer.[62]

[62]No wonder the money is decremented first.

Of course, if the power outage occurred between stages 2 and 3, the customer would have the correct amount of money in the correct accounts, but the bank would have no record of the transactions. This would lead to all sorts of book-keeping problems.

The answer is to treat the three separate stages as one logical unit, or transaction. Thus, when stage 1 starts executing, the transaction is started automatically. The same transaction continues until stage 3 is completed, after which point the transaction can be terminated with all the changes either being committed to the database or rolled back and discarded. Therefore, if a power outage happens at any point during the transaction, the entire transaction can be automatically rolled back when the database restarts and no permanent changes to the data will have been made.

A transaction is an all-or-nothing situation. Either it all works, or it all fails -- which is great news for our luckless bank customer.

Transactions are also sometimes described as having A.C.I.D. properties:

Atomic: A transaction's changes to the database are atomic: either all happen or none happen.
Consistent: A transaction is a correct transformation of the state. The actions taken as a group do not violate any of the integrity constraints associated with the state.
Isolated: Even though transactions can execute concurrently, it appears to each transaction that others executed either before or after it.
Durable: Once a transaction completes successfully (e.g., commit() returns success), then its changes to the state of the database survive any later failures.

Implementing ACID transaction handling within a database requires the use of a journal log file, along with some sophisticated techniques and much careful coding. That's why it's rare to find ACID transactions supported on free databases (with the notable exception of PostgreSQL), and why it carries a performance penalty when it is supported.

On the upside, full transaction handling brings with it far greater safety from power failures, client failures, database failures, and other popular forms of disaster. Simple explicit locking mechanisms do not afford the same level of safety and recoverability, as we shall discuss later.

Since not all database systems support transaction processing, you may not have the luxury of being able to roll back from inadvertent data corruptions or be saved from power outage. But if your database does support transactions, the DBI makes it easy to manage them in a portable manner.

6.3.1. Automatic Transaction Handling

The ISO standard for SQL defines a particular model for transactions. It says that a database connection is always in a transaction. Each and every transaction will end with either a commit or a rollback, and every new transaction will begin with the next statement executed. Most systems also define an auto-commit mechanism that acts as if commit() is automatically called after each statement.

The DBI standard tries to find a way to let all drivers for all databases appear to offer the same facilities, as much as possible. It does this by relying on the fact that there's little practical difference between a database that supports transactions but has auto-commit enabled, versus a database that doesn't support transactions at all.

The DBI standard also tries to ensure that an application written to require transactions can't accidentally be run against a database that doesn't support them. It does this by treating an attempt to disable auto-commit as a fatal error for such a database.

Given that the ability to enable and disable auto-commit is important, the DBI defines a database handle attribute called AutoCommit that regulates whether or not DBI should appear to automatically force a data commit after every statement.

For example, if you issue a statement like $dbh->do() that deletes some data within your database, and AutoCommit is set to a true value, you cannot roll back your change even if the database supports transactions.

The DBI defaults to having AutoCommit enabled, making this potentially dangerous behavior automatic unless explicitly disabled. This is due to the precedent set by ODBC and JDBC. It was probably a mistake for the DBI to put standards compliance above safety in this case. A future version may issue a warning if AutoCommit is not specified as an attribute to DBI->connect(), so it's worth getting used to adding it now.

The behavior of changing this attribute depends on which type of transaction handling your database supports. There are three possibilities:

No transaction support

Databases that have no transaction support are treated as always having AutoCommit enabled. Attempting to disable AutoCommit will result in a fatal error.

Always active transaction support

This group of databases includes mainstream commercial RDBMS products, such as Oracle, that support the ANSI/ISO standard for transaction behavior.

If AutoCommit is switched from being enabled to disabled, no immediate actions should occur. Any future statements that you issue become part of a new transaction that must be committed or rolled back.

If AutoCommit is switched from being disabled to enabled, any outstanding database changes will be automatically committed.

Explicit transaction support

Some databases, such as Informix, support the idea that transactions are optional and must be explicitly started by applications when required.

The DBI attempts to treat these systems as systems that have always active transactions. To accomplish this, the DBI requires the driver to automatically begin a transaction when AutoCommit is switched from enabled to disabled. Once a transaction has been committed or rolled back, the driver will automatically start a new transaction.

Therefore, despite its database independence, the DBI offers both simple automatic transaction committing and powerful manual transaction processing modes.

6.3.2. Forcing a Commit

The DBI defines a method called commit() for explicitly committing any uncommitted data within the current transaction. This method is executed against a valid database handle:

$dbh->commit();

If commit() is invoked while AutoCommit is enabled, a warning will be displayed similar to:

commit ineffective with AutoCommit

which merely tells you that the database changes have already been committed. This warning will also be displayed when commit() is invoked against a database that has no transaction support because, by definition, AutoCommit will be enabled.

6.3.3. Rolling Back Changes

The corollary operation to committing data to the database is to roll it back. The DBI defines a method called rollback(), which can be used to roll back the most recent uncommitted database changes.

Like commit(), the rollback() method is executed against a database handle:

$dbh->rollback();

Similarly, if rollback() is invoked while AutoCommit is enabled, a warning will be displayed similar to:

rollback ineffective with AutoCommit

signifying that the database changes have already been committed. This warning will also be displayed when rollback() is invoked against a database that has no transaction support because, by definition, AutoCommit will be enabled.

6.3.4. Disconnecting, One Way or Another

The transaction effect of explicitly disconnecting from a database while AutoCommit is disabled is, sadly, undefined. Some database systems, such as Oracle and Ingres, will automatically commit any outstanding changes. However, other database systems, such as Informix, will roll back any outstanding changes. Because of this, applications not using AutoCommit should always explicitly call commit() or rollback() before calling disconnect() .

So what happens if you don't explicitly call disconnect(), or don't have the chance to because the program exits after a die? Well, because DBI handles are object references, we can be sure that Perl itself will call the DESTROY method for us on each handle if the program exits, the handle goes out of scope, or the only copy of a handle is overwritten by another value.

The actual implementation of the DESTROY method is in the hands of the driver author. If the database handle is still connected then it should automatically call rollback() (unless AutoCommit is enabled) before calling disconnect(). Calling rollback() in DESTROY is critical. If the driver doesn't, then a program aborting due to a die part way though a transaction may actually "accidentally" commit the incomplete transaction! Fortunately, all the drivers that we're aware of that support transactions do the right thing.

As an extra sanity check, if you disconnect from a database while you still have active statement handles, you will get a warning. We discussed active statement handles and related topics in Chapter 5, "Interacting with the Database ".

6.3.5. Combining Automatic Error Handling with Transactions

Transactions, as you've probably realized by now, are closely related to error handling. This is especially true when you have to clean up after an error by putting everything in the database back to the way it was before the transaction started.

Chapter 4, "Programming with the DBI ", we discussed error handling in some detail and sang the praises of using the RaiseError attribute for automatic error detection.

Imagine combining the automatic error detection of the DBI's RaiseError attribute and the error trapping of Perl's eval { ... } and the error handling properties of transactions. The result is a simple yet powerful way to write robust applications in Perl.

There is a fairly common structure to these kind of applications, so to help us discuss the issues, we've included the following example.

This outline example processes CSV files containing sales data from one country, it fetches currency exchange rate information from a web site and adds that to the data, it then performs a series of inserts, selects, updates and more inserts of the data to update the database. That processing is repeated for a series of countries.

Here's the code:

### Connect to the database with transactions and error handing enabled
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
    AutoCommit => 0,
    RaiseError => 1,
} );

### Keep a count of failures. Used for program exit status
my @failed;

foreach my $country_code ( qw(US CA GB IE FR) ) {

    print "Processing $country_code\n";

    ### Do all the work for one country inside an eval
    eval {

        ### Read, parse and sanity check the data file (e.g., using DBD::CSV)
        my $data = load_sales_data_file( "$country_file.csv" );

        ### Add data from the Web (e.g., using the LWP modules)
        add_exchange_rates( $data, $country_code,
                            "http://exchange-rate-service.com" );

        ### Perform database loading steps (e.g., using DBD::Oracle)
        insert_sales_data( $dbh, $data );
        update_country_summary_data( $dbh, $data );
        insert_processed_files( $dbh, $country_code );

        ### Everything done okay for this file, so commit the database changes
        $dbh->commit();

    };

    ### If something went wrong...
    if ($@) {

        ### Tell the user that something went wrong, and what went wrong
        warn "Unable to process $country_code: $@\n";
        ### Undo any database changes made before the error occured
        $dbh->rollback();

        ### Keep track of failures
        push @failed, $country_code;

    }
}
$dbh->disconnect();

### Exit with useful status value for caller
exit @failed ? 1 : 0;

In the following list, we make some observations about how and why the code is structured the way it is and discuss a few related issues:

Unit of work

A key design issue is what the "unit of work" should be. In other words, how much work should we do before committing and thus how much would be undone by a rollback on error? The smallest unit should correspond to the smallest logically complete change to the database. In our example, that corresponds to the complete processing of one country file, and that's what we've chosen as the unit of work here.

We could have opted for a larger unit of work. Processing all the files as one unit would be another obvious choice. In that case, we'd simply need to move the foreach loop to inside the eval. You should be aware that most databases have limits on the amount of database changes you can make without committing. It's usually large, and always configurable, but you need to be aware that there are limits that may bite you if you try to do too much.

Where to commit

It's important to have the commit() inside the eval. The commit call is the most critical part of the transaction. Don't assume that the commit() will succeed just because the previous statements didn't return an error. Databases are free to defer much of the real work till commit is called.

The commit() call should be the very last thing before the end of the eval block. Sometimes it's more tricky. Imagine that the requirements changed and you were asked to make the script delete the files as it processed them. Where would you put the unlink() call? Before or after the commit()? Think about that for a moment. Remember that there's always the risk that either the commit() or the unlink() could fail. You need to weigh up both the risks and the after effects of either case.

Here's how it applies in our example: if you commit first and the unlink fails, then you'll process the file again the next time the script is run. If you unlink first and the commit fails, then you've lost the data. In this case, the lesser of the two evils is clearly to commit first and risk double processing, especially as double processing is probably easy to avoid by checking the data in the file against what's already in the database.

In the real world, things can be rather more complex. However, there are plenty of creative ways to address this two system commit problem. The most important thing to remember is that there is a problem to address whenever some change outside the database has to be committed at the same time that the database changes are committed.

When things go wrong

The first thing to do in your if ($@) {...} block is to print an error message. The error message code helps document what the error handling block is dealing with. And doing it first avoids the risk that another fatal error will happen before your message is printed, thus masking the underlying problem.

Please do yourself and your users a favor by always including as much useful information in your error messages as possible. It sounds simple, but over and over again we see code like:

$dbh->commit() or die "commit failed!"; # DUMB!

Using RaiseError helps here because it generates a message (or Perl $@ variable value) that includes the underlying error from the driver, and the driver and method names.[63]

[63]It also includes the filename and line number of the code where the error happened, but you can strip that off using a regular expression if you like.

So, if you catch an error using eval, remember to print the contents of $@. But don't stop at that. In most applications, there are variables that indicate what's being processed at the time. Including one or more of those, like $country_code in our example, adds valuable context to the error message.

As a general guide, every die or warn should have at least one variable interpolated into it, not counting $! or $@.

Consider protecting the rollback

So things have gone wrong, you've printed your error message, now it's time to rollback(). Pause for a moment. Remember that the database handle you're using has RaiseError set. That means that if the rollback itself fails then a new exception will be "thrown" and the script will immediately exit or jump into the arms of an enclosing eval.

Ask yourself if you're happy for a rollback error to trigger another exception. If not, then wrap it in an eval like this:

eval { $dbh->rollback };

Likely reasons for rollback() failing include database server shutdown or network communications failure, either of which may have been the cause of the error you're handling now. A rollback failure generally means that the database server is having a problem, and it's not worth trying to continue, so the default behavior is often what you want.

Exit status

Returning a reliable success/fail exit status from your scripts, even if you don't think you'll need it, is just good design. We recommend good design.

Death by misadventure

One of the important things to remember with transactions is that any calls made within an eval block can cause the program to die, and many of these calls may be totally unrelated to DBI. Therefore, by using an eval to trap any sort of problem whatsoever, you guarantee that you can cleanly roll back any incomplete transactions.