Chapter 2. Basic Non-DBI Databases
Storage Managers and Layers
There are several ways in which databases organize the data contained within them. The most common of these is the relational database methodology. Databases that use a relational model are called Relational Database Management Systems , or RDBMSs. The most popular database systems nowadays (such as Oracle, Informix, and Sybase) are all relational in design.
But what does "relational" actually mean? A relational database is a database that is perceived by the user as a collection of tables, where a table is an unordered collection of rows. (Loosely speaking, a relation is a just a mathematical term for such a table.) Each row has a fixed number of fields, and each field can store a predefined type of data value, such as an integer, date, or string.
Another type of methodology that is growing in popularity is the object-oriented methodology, or OODBMS. With an object-oriented model, everything within the database is treated as an object of a certain class that has rules defined within itself for manipulating the data it encapsulates. This methodology closely follows that of object-oriented programming languages such as Smalltalk, C++, and Java. However, the DBI does not support any real OODBMS, so for the moment this methodology will not be discussed further.
Finally, there are several simplistic database packages that exist on various operating systems. These simple database packages generally do not feature the more sophisticated functionality that ``real'' database engines provide. They are, to all intents, only slightly sophisticated file-handling routines, not actually database packages. However, in their defense, they can be extremely fast, and in certain situations the sophisticated functionality that a ``real'' database system provides is simply an unnecessary overhead.
In this chapter, we'll be exploring some non-DBI databases, ranging from the very simplest of ASCII data files through to disk-based hash files supporting duplicate keys. Along the way, we'll consider concurrent access and locking issues, and some applications for the rather useful Storable and Data::Dumper modules. (While none of this is strictly about the DBI, we think it'll be useful for many people, and even DBI veterans may pick up a few handy tricks.)
All of these database technologies, from the most complex to the simplest, share two basic attributes. The first is the very definition of the term: a database is a collection of data stored on a computer with varying layers of abstraction sitting on top of it. Each layer of abstraction generally makes the data stored within easier to both organize and access, by separating the request for particular data from the mechanics of getting that data.
The second basic attribute common to all database systems is that they all use Application Programming Interfaces (APIs) to provide access to the data stored within the database. In the case of the simplest databases, the API is simply the file read/write calls provided by the operating system, accessed via your favorite programming language.
An API allows programmers to interact with a more complex piece of software through access paths defined by the original software creators. A good example of this is the Berkeley Database Manager API. In addition to simply accessing the data, the API allows you to alter the structure of the database and the data stored within the database. The benefit of this higher level of access to a database is that you don't need to worry about how the Berkeley Database Manager is managing the data. You are manipulating an abstracted view via the API.
In higher-level layers such as those implemented by an RDBMS, the data access and manipulation API is completely divorced from the structure of the database. This separation of logical model from physical representation allows you to write standard database code (e.g., SQL) that is independent of the database engine that you are using.
2.1. Storage Managers and Layers
Modern databases, no matter which methodology they implement, are generally composed of multiple layers of software. Each layer implements a higher level of functionality using the interfaces and services defined by the lower-level layers.
For example, flat-file databases are composed of pools of data with very few layers of abstraction. Databases of this type allow you to manipulate the data stored within the database by directly altering the way in which the data is stored within the data files themselves. This feature gives you a lot of power and flexibility at the expense of being difficult to use, minimal in terms of functionality, and nerve-destroying since you have no safety nets. All manipulation of the data files uses the standard Perl file operations, which in turn use the underlying operating system APIs.
DBM file libraries, like Berkeley DB, are an example of a storage manager layer that sits on top of the raw data files and allows you to manipulate the data stored within the database through a clearly defined API. This storage manager translates your API calls into manipulations of the data files on your behalf, preventing you from directly altering the structure of the data in such a manner that it becomes corrupt or unreadable. Manipulating a database via this storage manager is far easier and safer than doing it yourself.
You could potentially implement a more powerful database system on top of DBM files. This new layer would use the DBM API to implement more powerful features and add another layer of abstraction between you and the actual physical data files containing the data.
There are many benefits to using higher-level storage managers. The levels of abstraction between your code and the underlying database allow the database vendors to transparently add optimizations, alter the structure of the database files, or port the database engine to other platforms without you having to alter a single line of code.
Copyright © 2001 O'Reilly & Associates. All rights reserved.