14.6. Sorting Large DBM FilesProblemYou want to process a large dataset you'd like to commit to a DBM file in a particular order. SolutionUse the DB_File's B-tree bindings and supply a comparison function of your own devising: use DB_File; # specify the Perl sub to do key comparison using the # exported $DB_BTREE hash reference $DB_BTREE->{'compare'} = sub { my ($key1, $key2) = @_ ; "\L$key1" cmp "\L$key2" ; }; tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE) or die "can't tie $filename: $!"; DescriptionAn annoyance of hashes, whether in memory or as DBM files, is that they do not maintain proper ordering. The CPAN module Tie::IxHash can make a regular hash in memory maintain its insertion order, but that doesn't help you for DBM databases or arbitrary sorting criteria.
The DB_File module supports a nice solution to this using a
B-tree implementation. One advantage of a B-tree over a regular DBM hash is its ordering. When the user defines a comparison function, all calls to Example 14.4: sortdemo#!/usr/bin/perl # sortdemo - show auto dbm sorting use strict; use DB_File; $DB_BTREE->{'compare'} = sub { my ($key1, $key2) = @_ ; "\L$key1" cmp "\L$key2" ; }; my %hash; my $filename = '/tmp/sorthash.db'; tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE) or die "can't tie $filename: $!"; my $i = 0; for my $word (qw(Can't you go camp down by Gibraltar)) { $hash{$word} = ++$i; } while (my($word, $number) = each %hash) { printf "%-12s %d\n", $word, $number; }
By default, the entries in a B-tree DB_File database are stored alphabetically. Here, though, we provide a case-insensitive comparison function, so using
This sorting property on hashes is so convenient that it's worth using even without a permanent database. If you pass tie(%hash, "DB_File", undef, O_RDWR|O_CREAT, 0666, $DB_BTREE) or die "can't tie: $!"; Remember these two things if you supply a comparison for your BTREE database. One, the new compare function must be specified when you create the database. Two, you cannot change the ordering once the database has been created; you must use the same compare function every time you access the database. Using BTREE databases under DB_File also permits duplicate or partial keys. See its documentation for examples. |
|