11.2 DBM Modules
A DBM-like file is a file that contains
a set of pairs of strings
(key,data),
with support for fetching or storing the data given a key, known as
keyed access. DBM-like files were originally
supported on early Unix systems, with functionality roughly
equivalent to that of access methods popular on other mainframe and
minicomputers of the time, such as ISAM, the Indexed-Sequential
Access Method. Today, several different libraries, available for many
platforms, let programs written in many different languages create,
update, and read DBM-like files.
Keyed access, while not as powerful as the data access functionality
of relational databases, may often suffice for a
program's needs. And if DBM-like files are
sufficient, you may end up with a program that is smaller, faster,
and more portable than one that uses an RDBMS.
The classic
dbm library, whose first version introduced
DBM-like files many years ago, has limited functionality, but tends
to be available on most Unix platforms. The GNU version,
gdbm, is richer and also widespread. The BSD
version, dbhash, offers superior functionality.
Python supplies modules that interface with each of these libraries
if the relevant underlying library is installed on your system.
Python also offers a minimal DBM module, dumbdbm
(usable anywhere, as it does not rely on other installed libraries),
and generic DBM modules, which are able to automatically identify,
select, and wrap the appropriate DBM library to deal with an existing
or new DBM file. Depending on your platform, your Python
distribution, and what dbm-like libraries you
have installed on your computer, the default Python build may install
some subset of these modules. In general, at a minimum, you can rely
on having module dbm on Unix-like platforms,
module dbhash on Windows, and
dumbdbm on any platform.
11.2.1 The anydbm Module
The anydbm module is a
generic interface to any other DBM module. anydbm
supplies a single factory function.
open(filename,flag='r',mode=0666)
|
|
Opens or creates the DBM file named by
filename (a string that can denote any
path to a file, not just a name), and returns a suitable mapping
object corresponding to the DBM file. When the DBM file already
exists, open uses module
whichdb to determine which DBM library can handle
the file. When open creates a new DBM file,
open chooses the first available DBM module in
order of preference: dbhash,
gdbm, dbm, and
dumbdbm.
flag is a one-character string that tells
open how to open the file and whether to create
it, as shown in Table 11-1.
mode is an integer that
open uses as the file's
permission bits if open creates the file, as
covered in Section 10.2.2 in Chapter 10. Not all DBM modules use
flags and mode,
but for portability's sake you should always supply
appropriate values for these arguments when you call
anydbm.open.
Table 11-1. flag values for anydbm.open
'r'
|
yes
|
open opens the file
|
open raises error
|
'w'
|
no
|
open opens the file
|
open raises error
|
'c'
|
no
|
open opens the file
|
open creates the file
|
'n'
|
no
|
open truncates the file
|
open creates the file
|
anydbm.open returns a mapping object
m that supplies a subset of the
functionality of dictionaries (covered in Chapter 4). m only accepts
strings as keys and values, and the only mapping methods
m supplies are
m.has_key and
m.keys. However, you
can bind, rebind, access, and unbind items in
m with the same indexing syntax
m[key]
that you would use if m were a dictionary.
If flag is 'r',
open returns a mapping
m that is read-only so that you can only
access m's items, not
bind, rebind, or unbind them. One extra method that
m supplies is
m.close, with the same
semantics as the close method of a built-in file
object. You should ensure
m.close( ) is called
when you're done using m.
The try/finally statement
(covered in Chapter 6) is the best way to ensure
finalization.
11.2.2 The dumbdbm Module
The dumbdbm module
supplies minimal DBM functionality and mediocre performance.
dumbdbm's only advantage is that
you can use it anywhere, since dumbdbm does not
rely on any library. You don't normally
import dumbdbm; rather,
import anydbm, and let
anydbm supply your program with the best DBM
module available, defaulting to dumbdbm if nothing
better is available on the current Python installation. The only case
in which you import dumbdbm directly is the rare
one in which you need to create a DBM-like file that you can later
read from any Python installation. Module dumbdbm
supplies an open function and an exception class
error that are polymorphic to those
anydbm supplies.
11.2.3 The dbm, gdbm, and dbhash Modules
The dbm module exists
only on Unix platforms, where it can wrap any of the
dbm, ndbm, and
gdbm libraries, since each supplies a
dbm-compatibility interface. You never
import dbm directly; rather,
you import anydbm, and let
anydbm supply your program with the best DBM
module available, defaulting to dbm if
appropriate. Module dbm supplies an
open function and an exception class
error that are polymorphic to those
anydbm
supplies.
The
gdbm module wraps the GNU DBM library,
gdbm. The gdbm.open function
accepts other values for the flag
argument, and returns a mapping object m
supplying a few extra methods. You may need to
import gdbm directly, if you
need to access non-portable functionality. I do not cover
gdbm specifics in this book, since the book is
focused on cross-platform Python.
The dbhash module
wraps the BSD DBM library in a DBM-compatible way. The
dbhash.open function accepts other values for the
flag argument, and returns a mapping
object m supplying a few extra methods.
You may choose to import dbhash
directly, if you need to access non-portable functionality. For full
access to the BSD DB functionality, however, you can also
import bsddb, covered in
Section 11.3 later in
this chapter.
11.2.4 The whichdb Module
The whichdb module attempts to guess which of the
several DBM modules are available. whichdb
supplies a single function.
Opens the file specified by filename and
determines which DBM-like package created the file.
whichdb returns None if the
file does not exist or cannot be opened and read.
whichdb returns '' if the file
exists and can be opened and read, but it cannot be determined which
DBM-like package created the file (i.e., the file is not a DBM file).
whichdb returns a string naming a module, such as
'dbm', 'dumbdbm', or
'dbhash', if it can determine which module can
read the DBM-like file named by filename.
11.2.5 Examples of DBM-Like File Use
Keyed access is quite suitable when your program needs to record, in
a persistent way, the equivalent of a Python dictionary, with strings
as both keys and values. For example, suppose you need to analyze
several text files, whose names are given as your
program's arguments, and record where each word
appears in those files. In this case, the keys are words, and,
therefore, intrinsically strings. The data you need to record for
each word is a list of
(filename,
line-number) pairs.
However, you can encode the data as a string in several ways, for
example by exploiting the fact that the path separator string
os.pathsep (covered in Chapter 10) does not normally appear in filenames. (Note
that more solid, general, and reliable approaches to the general
issue of encoding data as strings are covered in
Section 11.1 earlier in this
chapter.) With this simplification, the program that records word
positions in files might be as follows:
import fileinput, os, anydbm
wordPos = { }
sep = os.pathsep
for line in fileinput.input( ):
pos = '%s%s%s'%(fileinput.filename( ), sep, fileinput.filelineno( ))
for word in line.split( ):
wordPos.setdefault(word,[ ]).append(pos)
dbmOut = anydbm.open('indexfile','n')
sep2 = sep * 2
for word in wordPos:
dbmOut[word] = sep2.join(wordPos[word])
dbmOut.close( )
We can read back the data stored to the DBM-like file
indexfile in several ways. The following example
accepts words as command-line arguments and prints the lines where
the requested words appear:
import sys, os, anydbm, linecache
dbmIn = anydbm.open('indexfile')
sep = os.pathsep
sep2 = sep * 2
for word in sys.argv[1:]:
if not dbmIn.has_key(word):
sys.stderr.write('Word %r not found in index file\n' % word)
continue
places = dbmIn[word].split(sep2)
for place in places:
fname, lineno = place.split(sep)
print "Word %r occurs in line %s of file %s:" % (word,lineno,fname)
print linecache.getline(fname, int(lineno)),
|