Basic Non-DBI Databases (Programming the Perl DBI)

Chapter 2. Basic Non-DBI Databases

There are several ways in which databases organize the data contained within them. The most common of these is the relational database methodology. Databases that use a relational model are called Relational Database Management Systems , or RDBMSs. The most popular database systems nowadays (such as Oracle, Informix, and Sybase) are all relational in design.

But what does "relational" actually mean? A relational database is a database that is perceived by the user as a collection of tables, where a table is an unordered collection of rows. (Loosely speaking, a relation is a just a mathematical term for such a table.) Each row has a fixed number of fields, and each field can store a predefined type of data value, such as an integer, date, or string.

Another type of methodology that is growing in popularity is the object-oriented methodology, or OODBMS. With an object-oriented model, everything within the database is treated as an object of a certain class that has rules defined within itself for manipulating the data it encapsulates. This methodology closely follows that of object-oriented programming languages such as Smalltalk, C++, and Java. However, the DBI does not support any real OODBMS, so for the moment this methodology will not be discussed further.

In this chapter, we'll be exploring some non-DBI databases, ranging from the very simplest of ASCII data files through to disk-based hash files supporting duplicate keys. Along the way, we'll consider concurrent access and locking issues, and some applications for the rather useful Storable and Data::Dumper modules. (While none of this is strictly about the DBI, we think it'll be useful for many people, and even DBI veterans may pick up a few handy tricks.)

An API allows programmers to interact with a more complex piece of software through access paths defined by the original software creators. A good example of this is the Berkeley Database Manager API. In addition to simply accessing the data, the API allows you to alter the structure of the database and the data stored within the database. The benefit of this higher level of access to a database is that you don't need to worry about how the Berkeley Database Manager is managing the data. You are manipulating an abstracted view via the API.

In higher-level layers such as those implemented by an RDBMS, the data access and manipulation API is completely divorced from the structure of the database. This separation of logical model from physical representation allows you to write standard database code (e.g., SQL) that is independent of the database engine that you are using.

2.1. Storage Managers and Layers

Modern databases, no matter which methodology they implement, are generally composed of multiple layers of software. Each layer implements a higher level of functionality using the interfaces and services defined by the lower-level layers.

For example, flat-file databases are composed of pools of data with very few layers of abstraction. Databases of this type allow you to manipulate the data stored within the database by directly altering the way in which the data is stored within the data files themselves. This feature gives you a lot of power and flexibility at the expense of being difficult to use, minimal in terms of functionality, and nerve-destroying since you have no safety nets. All manipulation of the data files uses the standard Perl file operations, which in turn use the underlying operating system APIs.

DBM file libraries, like Berkeley DB, are an example of a storage manager layer that sits on top of the raw data files and allows you to manipulate the data stored within the database through a clearly defined API. This storage manager translates your API calls into manipulations of the data files on your behalf, preventing you from directly altering the structure of the data in such a manner that it becomes corrupt or unreadable. Manipulating a database via this storage manager is far easier and safer than doing it yourself.

You could potentially implement a more powerful database system on top of DBM files. This new layer would use the DBM API to implement more powerful features and add another layer of abstraction between you and the actual physical data files containing the data.

There are many benefits to using higher-level storage managers. The levels of abstraction between your code and the underlying database allow the database vendors to transparently add optimizations, alter the structure of the database files, or port the database engine to other platforms without you having to alter a single line of code.

Chapter 2. Basic Non-DBI Databases

Contents:

2.1. Storage Managers and Layers