16.3. Fixed-length Random-access DatabasesAnother form of persistent data is the fixed-length, record-oriented disk file.[350] In this scheme, the data consists of a number of records of identical length. The numbering of the records is either not important or determined by some indexing scheme.
For example, we might want to store some information about each bowler at Bedrock Lanes. Let's say we decide to have a series of records, one per bowler, in which the data holds the player's name, age, last five bowling scores, and the time and date of his last game. We need to decide upon a suitable format for this data. Let's say that after studying the available formats in the documentation for pack, we decide to use 40 characters for the player's name, a one-byte integer for his age,[351] five two-byte integers for his last five scores,[352] and a four-byte integer for the timestamp of his most-recent game,[353] giving a format string of "a40 C I5 L". Each record is thus 55 bytes long. If we were reading all of the data in the database, we'd read chunks of 55 bytes until we got to the end. If we wanted to go to the fifth record, we'd skip ahead 4 x 55 bytes (220 bytes) and read the fifth record directly.
Perl supports programs that use such a disk file. In order to do so, however, you need to learn a few more things, including how to:
The open function has an additional mode we haven't shown yet. If you use "+<" at the front of the filename parameter's string, that is similar to using "<" to open the existing file for reading, except that it also asks for write permission on the file. Thus you can have read/write access to the file: open(FRED, "<fred"); # open file fred for reading (error if file absent) open(FRED, "+<fred"); # open file fred read/write (error if file absent) Similarly, "+>" says to create a new file (as ">" would), but to have read access to it as well, thus also giving read/write access: open(WILMA, ">wilma"); # make new file wilma (wiping out existing file) open(WILMA, "+>wilma"); # make new file wilma, but also with read access Do you see the important difference between the two new modes? Both give read/write access to a file. But "+<" lets you work with an existing file; it doesn't create it. The second mode, "+>" isn't often useful, because it gives read/write access to a new, empty file that it has just created. That's mostly used for temporary (scratch) files. Once we've got the file open, we need to move around in it. You do this with the seek function: seek(FRED, 55 * $n, 0); # seek to start of record $n The first parameter to seek is a filehandle, the second parameter gives the offset in bytes from the start of the file, and the third parameter is zero.[354] To get to a certain record in our file of bowling data, you'll need to skip over some other records. Since each record is 55 bytes long, we'll multiply $n times 55 to find out which byte position we want. (Note that the record numbers are thus zero-based; record zero is at the beginning of the file.)
Once the file pointer has been positioned with seek, the next input or output operation will start at that position. When we're ready to read from the file, we can't use the ordinary line-input operator because that's made to read lines, not 55-byte records. There may not be a newline character in this entire file, or it may appear in packed data in the middle of a record. Instead, we'll use the read function: my $buf; # The input buffer variable my $number_read = read(FRED, $buf, 55); As you can see, the first parameter to read is the filehandle. The second parameter is a buffer variable; the data read will be placed into this variable. (Yes, this is an odd way to get the result.) The third parameter is the number of bytes to read; here we've asked for 55 bytes, since that's the size of our record. Normally, you can expect the length of $buf to be the specified number of bytes, and you can expect that the return value (in $number_read) to be the same. But if your current position in the file is only five bytes from the end when you request 55 bytes, you'll get only five. Under normal circumstances, you'll get as many bytes as you ask for. Once you've got those 55 bytes, what can you do with them? You can unpack them (using the format we previously designed) to get the bowler's name and other information, of course: my($name, $age, $score_1, $score_2, $score_3, $score_4, $score_5, $when) = unpack "a40 C I5 L", $buf; Since we can read the information from the file with read, can you guess how we can write it back into the file? Sorry, it's not write; that was a trick question.[355] You already know the correct function, which is print. But you have to be sure that the data string is exactly the right size; if it's too large, you'll overwrite the next record's data, but if it's too small, leftover data in the current record may be mixed with the new data. To ensure that the length is correct, we'll use pack. Let's say that Wilma has just bowled a game and her new score is in $new_score. That will be the first of the five most-recent scores we keep for her ($score_5, as the oldest one, will be discarded), and in place of $when (the timestamp of her previous game), we'll store the current time from the time function:
print FRED pack("a40 C I5 L", $name, $age, $new_score, $score_1, $score_2, $score_3, $score_4, time); On some systems, you'll have to use seek whenever you switch from reading to writing, even if the current position in the file is already correct. It's not a bad idea, then, to always use seek right before reading or printing. Rather than use the two constant values "a40 C I5 L" and 55 throughout the program, as we've done here, it would generally be better to define them just once near the top of the code. That way, if we ever need to change the database format, we don't have to go searching through our code for places where the number 55 appears. Here's one way you might define both of those values, using the length function to determine the length of a string so you won't have to count bytes: my $pack_format = "a40 C I5 L"; my $pack_length = length pack($pack_format, "dummy data", 0, 1, 2, 3, 4, 5, 6); Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|