Another form of persistent data is the fixed-length, record-oriented disk file. In this scheme, the data consists of a number of records of identical length. The numbering of the records is either not important or determined by some indexing scheme.
For example, we might have a series of records in which the data has 40 characters of first name, a one-character middle initial, 40 characters of last name, and then a two-byte integer for the age. Each record is then 83 bytes long. If we were reading all of the data in the database, we'd read chunks of 83 bytes until we got to the end. If we wanted to go to the fifth record, we'd skip ahead four times 83 bytes (332 bytes) and read the fifth record directly.
Perl supports programs that use such a disk file. A few things are necessary in addition to what you already know:
-
Opening a disk file for both reading and writing, and setting the filehandle to binary mode
-
Moving around in this file to an arbitrary position
-
Fetching data by a length rather than up to the next newline
-
Writing data down in fixed-length blocks
The
open
function takes an additional
plus sign before its I/O direction specification to indicate that the file is really being opened for both reading and writing. For example:
open(A,"+<b"); # open file b read/write (error if file absent)
open(C,"+>d"); # create file d, with read/write access
open(E,"+>>f"); # open or create file f with read/write access
Notice that all we've done was to prepend a plus sign to the I/O direction.
Next, we need to set the filehandle to binary mode using the
binmode
function:
binmode(A); # set the filehandle to binary mode
Some operating systems don't need to use
binmode
, so you may find scripts that don't do this. Windows NT (and Windows 95) systems
do
need to use
binmode
, so if you find yourself getting strange results while using a random-access database file, this is the first place you should check.
After we've got the file open, we need to
move around in it. You do this with the
seek
function, which takes the same three parameters as the C
fseek
library routine. The first parameter is a
filehandle; the second parameter gives an offset, which is interpreted in conjunction with the third parameter. Usually, you'll want the third parameter to be zero so that the second parameter selects a new absolute position for the next read from or write to the file. For example, to go to the fifth record on the filehandle
NAMES
(as described above), you can do this:
seek(NAMES,4*83,0);
After the file pointer has been repositioned, the next input or output will start there. For output, use the
print
operator, but be sure that the data you are writing is the right length. To obtain the right length, we can call upon the
pack()
operator:
print NAMES pack("A40 A A40 s", $first, $middle, $last, $age);
That
pack()
specifier gives 40 characters for
$first
, a single character for
$middle
, 40 more characters for
$last
, and a short (two bytes) for the
$age
. This should be 83 bytes long, and will be written at the current file position.
Last, we need to fetch a particular record. Although the
<NAMES>
operator returns all of the data from the current position to the next newline, that's not correct; the data is supposed to go for 83 bytes, and there probably isn't a newline right there. Instead, we use the
read
function, which looks and works a lot like its C language counterpart:
$count = read(NAMES, $buf, 83);
The first parameter for
read
is the
filehandle. The second parameter is a scalar variable that holds the data that will be read. The third parameter gives the number of bytes to read. The
return value from
read
is the number of bytes actually read; typically, this number is the same number as the number of bytes asked for unless the filehandle is not opened or you are too close to the end of the file.
After you have the 83-character data, break the data into its component parts with the
unpack
operator:
($first, $middle, $last, $age) =
unpack("A40 A A40 s", $buf);
Note that the
pack
and
unpack
format strings are the same. Most programs store this string in a variable early in the program, and even compute the length of the records using
pack
instead of sprinkling the constant 83 everywhere:
$names = "A40 A A40 s";
$names_length =
length(pack($names)); # probably 83