home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam

# Chapter 19. DBM and mod_perl

#### Contents:

Some of the earliest databases implemented on Unix were Database Management (DBM) files, and many are still in use today. As of this writing, the Berkeley DB is the most powerful DBM implementation. Berkeley DB is available at http://www.sleepycat.com/. If you need a light database with an easy API, using simple key-value pairs to store and manipulate a relatively small number of records, DBM is the solution that you should consider first.

With DBM, it is rare to read the whole database into memory. Combine this feature with the use of smart storage techniques, and DBM files can be manipulated much faster than flat files. Flat-file databases can be very slow when the number of records starts to grow into the thousands, especially for insert, update, and delete operations. Sort algorithms on flat files can also be very time-consuming.

The maximum practical size of a DBM database depends on many factors, such as your data, your hardware, and the desired response times. But as a rough guide, consider 5,000 to 10,000 records to be reasonable.

We will talk mostly about Berkeley DB Version 1.x, as it provides the best functionality while having good speed and almost no limitations. Other implementations might be faster in some cases, but they are limited either in the length of the maximum value or the total number of records.

#### Big-O Notation

In math, complexity is expressed using big-O notation. For a problem of size N:

• A constant-time method is "order 1": O(1)

• A linear-time method is "order N": O(N)

• A quadratic-time method is "order N squared": O(N2)

For example, a lookup action in a properly implemented hash of size N with random data has a complexity of O(1), because the item is located almost immediately after its hash value is calculated. However, the same action in the list of N items has a complexity of O(N), since on average you have to go through almost all the items in the list before you find what you need.

Most often you will want to use the HASH method, but there are many considerations and your choice may be dictated by your application.

In recent years, DBM databases have been extended to allow you to store more complex values, including data structures. The MLDBM module can store and restore the whole symbol table of your script, including arrays and hashes.

#### Example 19-1. btree2hash.pl

```#!/usr/bin/perl -w

#
# This script takes as its parameters a list of Berkeley DB
# file(s) which are stored with the DB_BTREE algorithm.  It
# will back them up using the .bak extension and create
# instead DBMs with the same records but stored using the
# DB_HASH algorithm.
#
# Usage: btree2hash.pl filename(s)

use strict;
use DB_File;
use Fcntl;

# @ARGV checks
die "Usage: btree2hash.pl filename(s))\n" unless @ARGV;

for my \$filename (@ARGV) {
die "Can't find \$filename: \$!"
unless -e \$filename and -r _;

# First back up the file
rename "\$filename", "\$filename.btree"
or die "can't rename \$filename with \$filename.btree: \$!";

# tie both DBs (db_hash is a fresh one!)
tie my %btree , 'DB_File',"\$filename.btree", O_RDWR|O_CREAT,
0660, \$DB_BTREE or die "Can't tie \$filename.btree: \$!";
tie my %hash ,  'DB_File',"\$filename" , O_RDWR|O_CREAT,
0660, \$DB_HASH  or die "Can't tie \$filename: \$!";

# copy DB
%hash = %btree;

# untie
untie %btree;
untie %hash;
}```

Note that some DBM implementations come with other conversion utilities as well.