home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  

10.3 Record-Oriented Approach

In this section, we will study three modules that essentially depend on the DBM library. DBM is a disk-based hash table, originally written by Ken Thompson for the Seventh Edition Unix system. This library has since spawned many variants: SDBM (Simple DBM, a public-domain module bundled with Perl), NDBM (New DBM, which is packaged with some operating systems), and GDBM (from the Free Software Foundation). All these libraries can be accessed from equivalent Perl modules, which use Perl's tie facility to provide transparent access to the disk-based table. Performance and portability are the only criteria for selecting one of these systems. Be warned that the files produced by these approaches are not interchangeable.

10.3.1 DBM

We use SDBM here, because it is bundled with Perl. The SDBM_File module provides a wrapper over this extension:

use Fcntl;
use SDBM_File;
tie (%capital, 'SDBM_File', 'capitals.dat', O_RDWR|O_CREAT, 0666) 
     || die $!;
$capital{USA}      = "Washington D.C.";
$capital{Colombia} = "Bogota";
untie %capital;

The tie statement associates the in-memory hash variable, %capital , with the disk-based hash file, capitals.dat . Read and write accesses to %capital are automatically translated to corresponding accesses to the file. untie breaks this association and flushes any pending changes to the disk. O_RDWR and O_CREAT , "constants" imported from Fcntl , specify that capitals.dat is to be opened for reading and writing, and to create it if it doesn't exist. The file's mode (bitmask for access privileges) is set to the 0644 in this case - the result of 0666 & ~022, where 022 is the umask.

The biggest problem with the DBM approaches mentioned earlier is that the value in a tied key-value pair has to be a string or number; if it is a reference, these modules do not dereference it automatically. So to associate a key with a complex data structure, you must serialize the structure using Data::Dumper or Freeze-Thaw, which is exactly what is done by MLDBM, described next.

10.3.2 MLDBM

Gurusamy Sarathy's MLDBM (multilevel DBM) stores complex values in a DBM file. It uses Data::Dumper to serialize any data structures, and uses a DBM module of your choice (SDBM_File is used by default) to send it to disk. This is how it is used:

use SDBM_File;
use MLDBM qw (SDBM_File); 
use Fcntl;
tie (%h, 'MLDBM', 'bar', O_CREAT|O_RDWR, 0666) || die $!;
$sample   = {'burnt' => 'umber', 'brownian' => 'motion'} ;
$h{pairs} = $sample;   # Creating a disk-based hash of hashes
untie %h;

All parameters to tie following the string "MLDBM" are simply passed to the module specified in the use statement.

10.3.3 Berkeley DB

DB [ 5 ] - also referred to as Berkeley DB  - is a public-domain C library of database access methods, including B+Tree, Extended Linear Hashing, and fixed/variable length records. The latest release also supports concurrent updates, transactions, and recovery. The corresponding Perl module, DB_File, puts a DBM wrapper around the B-tree and hashing implementations, and a tied array wrapper over the fixed/variable length record (also known as the recno access method).

The DBM usage is identical to the ones shown in the preceding sections. The tie statement is as follows:

use DB_File;
use Fcntl;    # For the constants O_RDWR and O_CREAT
tie (%h, 'DB_File', $file, O_RDWR|O_CREAT, 0666, $DB_BTREE);

The $DB_BTREE constant tells the library to use the btree format, allowing the key-value pairs to be stored in a sorted, balanced multiway tree; that is, the keys are stored in lexical order. You can also specify your custom sorting subroutine like this:

$DB_BTREE->{'compare'} = \&sort_ignorecase;
sub sort_ignorecase {
    my ($key1, $key2) = @_;
    $key1 =~ s/\s*//g;          # Get rid of white space
    $key2 =~ s/\s*//g;
    lc($key1) cmp lc($key2);    # Ignore case when comparing

Now, when you use keys , values , or each to retrieve data from the tied hash, you get them in your custom sorted order. An ordinary hash and the other DBM modules do not give you this facility.

You can use $DB_RECNO instead of $DB_BTREE , which uses TIEARRAY to treat a file as a collection of variable-length records:

use Fcntl;
use DB_File;
tie (@l, 'DB_File', 'foo.txt', O_RDWR|O_CREAT,0666, $DB_RECNO);
print $l[1];                    # Retrieve second line
$l[3] = 'Three musketeers';     # Modify fourth line
untie @l;

As was mentioned in Chapter 9, Tie , the current TIEARRAY implementation allows only array indexing; operators like push and splice are not supported. The DB_File module provides extra methods called push , pop , shift , unshift , and length , which can be used like this:

$db = tied @l;