home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Practical mod_perlPractical mod_perlSearch this book

Chapter 20. Relational Databases and mod_perl

Nowadays, millions of people surf the Internet. There are millions of terabytes of data lying around, and many new techniques and technologies have been invented to manipulate this data. One of these inventions is the relational database, which makes it possible to search and modify huge stores of data very quickly. The Structured Query Language (SQL) is used to access and manipulate the contents of these databases.

Let's say that you started your web services with a simple, flat-file database. Then with time your data grew big, which made the use of a flat-file database slow and inefficient. So you switched to the next simple solution—using DBM files. But your data set continued to grow, and even the DBM files didn't provide a scalable enough solution. So you finally decided to switch to the most advanced solution, a relational database.

On the other hand, it's quite possible that you had big ambitions in the first place and you decided to go with a relational database right away.

We went through both scenarios, sometimes doing the minimum development using DBM files (when we knew that the data set was small and unlikely to grow big in the short term) and sometimes developing full-blown systems with relational databases at the heart.

As we repeat many times in this book, none of our suggestions and examples should be applied without thinking. But since you're reading this chapter, the chances are that you are doing the right thing, so we are going to concentrate on the extra benefits that mod_perl provides when you use relational databases. We'll also talk about related coding techniques that will help you to improve the performance of your service.

From now on, we assume that you use the DBI module to talk to the databases. This in turn uses the unique database driver module for your database, which resides in the DBD:: namespace (for example, DBD::Oracle for Oracle and DBD::mysql for MySQL). If you stick to standard SQL, you maximize portability from one database to another. Changing to a new database server should simply be a matter of using a different database driver. You do this just by changing the data set name string ($dsn) in the DBI->connect( ) call.

Rather than writing your queries in plain SQL, you should probably use some other abstraction module on top of the DBI module. This can help to make your code more extensible and maintainable. Raw SQL coupled with DBI usually gives you the best machine performance, but sometimes time to market is what counts, so you have to make your choices. An abstraction layer with a well-thought-out API is a pleasure to work with, and future modifications to the code will be less troublesome. Several DBI abstraction solutions are available on CPAN. DBIx::Recordset, Alzabo, and Class::DBI are just a few such modules that you may want to try. Take a look at the other modules in the DBIx:: category—many of them provide some kind of wrapping and abstraction around DBI.

20.1. Persistent Database Connections with Apache::DBI

When people first started to use the Web, they found that they needed to write web interfaces to their databases, or add databases to drive their web interfaces. Whichever way you look at it, they needed to connect to the databases in order to use them.

CGI is the most widely used protocol for building such interfaces, implemented in Apache's mod_cgi and its equivalents. For working with databases, the main limitation of most implementations, including mod_cgi, is that they don't allow persistent connections to the database. For every HTTP request, the CGI script has to connect to the database, and when the request is completed the connection is closed. Depending on the relational database that you use, the time to instantiate a connection may be very fast (for example, MySQL) or very slow (for example, Oracle). If your database provides a very short connection latency, you may get away without having persistent connections. But if not, it's possible that opening a connection may consume a significant slice of the time to serve a request. It may be that if you can cut this overhead you can greatly improve the performance of your service.

Apache::DBI was written to solve this problem. When you use it with mod_perl, you have a database connection that persists for the entire life of a mod_perl process. This is possible because with mod_perl, the child process does not quit when a request has been served. When a mod_perl script needs to use a database, Apache::DBI immediately provides a valid connection (if it was already open) and your script starts doing the real work right away without having to make a database connection first.

Of course, the persistence doesn't help with any latency problems you may encounter during the actual use of the database connections. Oracle, for example, is notorious for generating a network transaction for each row returned. This slows things down if the query execution matches many rows.

You may want to read Tim Bunce's "Advanced DBI" talk, at http://dbi.perl.org/doc/conferences/tim_1999/index.html, which covers many techniques to reduce latency.

20.1.2. When to Use Apache::DBI (and When Not to Use It)

You will want to use the Apache::DBI module only if you are opening just a few database connections per process. If there are ten child processes and each opens two different connections (using different connect( ) arguments), in total there will be 20 opened and persistent connections.

This module must not be used if (for example) you have many users, and a unique connection (with unique connect( ) arguments) is required for each user.[48] You cannot ensure that requests from one user will be served by any particular process, and connections are not shared between the child processes, so many child processes will open a separate, persistent connection for each user. In the worst case, if you have 100 users and 50 processes, you could end up with 5,000 persistent connections, which might be largely unused. Since database servers have limitations on the maximum number of opened connections, at some point new connections will not be permitted, and eventually your service will become unavailable.

[48]That is, database user connections. This doesn't mean that if many people register as users on your web site you shouldn't use Apache::DBI; it is only a very special case.

If you want to use Apache::DBI but you have both situations on one machine, at the time of writing the only solution is to run two mod_perl-enabled servers, one that uses Apache::DBI and one that does not.

In mod_perl 2.0, a threaded server can be used, and this situation is much improved. Assuming that you have a single process with many threads and each unique open connection is needed by only a single thread, it's possible to have a pool of database connections that are reused by different threads.

20.1.5. Caveats and Troubleshooting

This section covers some of the risks and things to keep in mind when using Apache::DBI.

20.1.5.3. Opening connections with different parameters

When Apache::DBI receives a connection request, before it decides to use an existing cached connection it insists that the new connection be opened in exactly the same way as the cached connection. If you have one script that sets AutoCommit and one that does not, Apache::DBI will make two different connections. So, for example, if you have limited Apache to 40 servers at most, instead of having a maximum of 40 open connections, you may end up with 80.

These two connect( ) calls will create two different connections:

my $dbh = DBI->connect
    ("DBI:mysql:test:localhost", '', '',
     {
      PrintError => 1, # warn( ) on errors
      RaiseError => 0, # don't die on error
      AutoCommit => 1, # commit executes immediately
     }
    ) or die "Cannot connect to database: $DBI::errstr";

my $dbh = DBI->connect
    ("DBI:mysql:test:localhost", '', '',
     {
      PrintError => 1, # warn( ) on errors
      RaiseError => 0, # don't die on error
      AutoCommit => 0, # don't commit executes immediately
     }
    ) or die "Cannot connect to database: $DBI::errstr";

Notice that the only difference is in the value of AutoCommit.

However, you are free to modify the handle immediately after you get it from the cache, so always initiate connections using the same parameters and set AutoCommit (or whatever) afterward. Let's rewrite the second connect( ) call to do the right thing (i.e., not to create a new connection):

my $dbh = DBI->connect
    ("DBI:mysql:test:localhost", '', '',
     {
      PrintError => 1, # warn( ) on errors
      RaiseError => 0, # don't die on error
      AutoCommit => 1, # commit executes immediately
     }
    ) or die "Cannot connect to database: $DBI::errstr";
$dbh->{AutoCommit} = 0; # don't commit if not asked to

When you aren't sure whether you're doing the right thing, turn on debug mode.

When the $dbh attribute is altered after connect( ), it affects all other handlers retrieving this database handle. Therefore, it's best to restore the modified attributes to their original values at the end of database handle usage. As of Apache::DBI Version 0.88, the caller has to do this manually. The simplest way to handle this is to localize the attributes when modifying them:

my $dbh = DBI->connect(...) ...
{
  local $dbh->{LongReadLen} = 40;
}

Here, the LongReadLen attribute overrides the value set in the connect( ) call or its default value only within the enclosing block.

The problem with this approach is that prior to Perl Version 5.8.0 it causes memory leaks. So the only clean alternative for older Perl versions is to manually restore $dbh's values:

my @attrs = qw(LongReadLen PrintError);
my %orig = ( );

my $dbh = DBI->connect(...) ...

# store the values away
$orig{$_} = $dbh->{$_} for @attrs;
# do local modifications
$dbh->{LongReadLen} = 40;
$dbh->{PrintError}  = 1;

# do something with the database handle
# ...

# now restore the values
$dbh->{$_} = $orig{$_} for @attrs;

Another thing to remember is that with some database servers it's possible to access more than one database using the same database connection. MySQL is one of those servers. It allows you to use a fully qualified table specification notation. So if there is a database foo with a table test and a database bar with its own table test, you can always use:

SELECT * FROM foo.test ...

or:

SELECT * FROM bar.test ...

No matter what database you have used in the database name string in the connect( ) call (e.g., DBI:mysql:foo:localhost), you can still access both tables by using a fully qualified syntax.

Alternatively, you can switch databases with USE foo and USE bar, but this approach seems less convenient, and therefore error-prone.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.