Databases Overview (Practical mod

What's a database? We can use pretty much anything as a database, as long as it allows us to store our data and retrieve it later. There are many different kinds of databases. Some allow us to store data and retrieve it years later; others are capable of preserving data only while there is an electricity supply. Some databases are designed for fast searches, others for fast insertions. Some databases are very easy to use, while some are very complicated (you may even have to learn a whole language to know how to operate them). There are also large price differences.

When we choose a database for our application, we first need to define the requirements in detail (this is known as a specification). If the application is for short-term use, we probably aren't going to use an expensive, advanced database. A quick-and-dirty hack may do. If, on the other hand, we design a system for long-term use, it makes sense to take the time to find the ideal database implementation.

Databases can be of two kinds: volatile and non-volatile. These two concepts pretty much relate to the two kinds of computer memory: RAM-style memory, which usually loses all its contents when the electricity supply is cut off; and magnetic (or optical) memory, such as hard disks and compact discs, which can retain the information even without power.

17.1. Volatile Databases

We use volatile databases all the time, even if we don't think about them as real databases. These databases are usually just part of the programs we run.

17.1.1. In-Memory Databases in a Single Process

If, for example, we want to store the number of Perl objects that exist in our program's data, we can use a variable as a volatile database:

package Book::ObjectCounter;
use strict;
my $object_count = 0;
sub new {
    my $class = shift;
    $object_count++;
    return bless {  }, $class;
}
sub DESTROY {
    $object_count--;
}

In this example, $object_countserves as a database—it stores the number of currently available objects. When a new object is created this variable increments its value, and when an object gets destroyed the value is decremented.

Now imagine a server, such as mod_perl, where the process can run for months or even years without quitting. Doing this kind of accounting is perfectly suited for the purpose, for if the process quits, all objects are lost anyway, and we probably won't care how many of them were alive when the process terminated.

Here is another example:

$DNS_CACHE{$dns} ||= dns_resolve($dns);
print "Hostname $dns has $DNS_CACHE{$dns} IP\n";

This little code snippet takes the hostname stored in $dns and checks whether we have the corresponding IP address cached in %DNS_CACHE. If not, it resolves it and caches it for later reuse. At the end, it prints out both the hostname and the corresponding IP address.

%DNS_CACHEsatisfies our definition of a database. It's a volatile database, since when the program quits the data disappears. When a mod_perl process quits, the cache is lost, but there is a good chance that we won't regret the loss, since we might want to cache only the latest IP addresses anyway. Now if we want to turn this cache into a non-volatile database, we just need to tie %DNS_CACHE to a DBM file, and we will have a permanent database. We will talk about Database Management (DBM) files in Chapter 19.

In Chapter 18, we will show how you can benefit from this kind of in-process database under mod_perl. We will also show how during a single request different handlers can share data and how data can persist across many requests.

17.1.2. In-Memory Databases Across Multiple Processes

Sharing results is more efficient than having each child potentially waste a lot of time generating redundant data. On the other hand, the information may not be important enough, or have sufficient long-term value, to merit being stored on disk. In this scenario, Inter-Process Communication (IPC) is a useful tool to have around.

This topic is non-specific to mod_perl and big enough to fill several books on its own. A non-exhaustive list of the modules to look at includes IPC::SysV, IPC::Shareable, IPC::Semaphore, IPC::ShareLite, Apache::Session, and Cache::Cache. And of course make sure to read the perlipc manpage. Also refer to the books listed in Section 17.3 at the end of this chapter.

Chapter 17. Databases Overview

Contents:

17.1. Volatile Databases

17.1.1. In-Memory Databases in a Single Process

17.1.2. In-Memory Databases Across Multiple Processes