home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Perl CookbookPerl CookbookSearch this book

14.5. Sorting Large DBM Files

14.5.3. Description

An annoyance of hashes, whether in memory or as DBM files, is that they do not maintain proper ordering. The CPAN module Tie::IxHash can make a regular hash in memory maintain its insertion order, but that doesn't help for DBM databases or arbitrary sorting criteria.

The DB_File module supports a nice solution to this using a B-tree implementation. One advantage of a B-tree over a regular DBM hash is its ordering. When the user defines a comparison function, all calls to keys, values, and each are automatically ordered. For example, Example 14-3 is a program that maintains a hash whose keys will always be sorted case-insensitively.

Example 14-3. sortdemo

  #!/usr/bin/perl
  # 
  sortdemo - show auto dbm sorting
  use strict;
  use DB_File;
  
  $DB_BTREE->{'compare'} = sub {
      my ($key1, $key2) = @_ ;
      "\L$key1" cmp "\L$key2" ;
  };
  
  my %hash;
  my $filename = '/tmp/sorthash.db';
  tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE)
      or die "can't tie $filename: $!";
  
  my $i = 0;
  for my $word (qw(Can't you go camp down by Gibraltar)) {
      $hash{$word} = ++$i;
  }
  
  while (my($word, $number) = each %hash) {
      printf "%-12s %d\n", $word, $number;
  }

By default, the entries in a B-tree DB_File database are stored alphabetically. Here, though, we provide a case-insensitive comparison function, so using each to fetch all the keys would show:

by           6
camp         4
Can't        1
down         5
Gibraltar    7
go           3
you          2

This sorting property on hashes is so convenient that it's worth using even without a permanent database. If you pass undef where the filename is expected on the tie, DB_File will create a file in /tmp and then immediately unlink it, giving an anonymous database:

tie(%hash, "DB_File", undef, O_RDWR|O_CREAT, 0666, $DB_BTREE)
        or die "can't tie: $!";

Remember these two things if you supply a comparison for your BTREE database. One, the new compare function must be specified when you create the database. Two, you cannot change the ordering once the database has been created; you must use the same compare function every time you access the database.

Using BTREE databases under DB_File also permits duplicate or partial keys. See its documentation for examples.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.