8.27. Program: Flat File IndexesIt sometimes happens that you need to jump directly to a particular line number in a file, but the lines vary in length, so you can't use Recipe 8.12. Although you could start at the beginning of the file and read every line, this is inefficient if you're making multiple queries. The solution is to build an index of fixed-width records, one per line. Each record contains the offset in the data file of the corresponding line. The subroutine in Example 8-10 takes the data file and a filehandle to send the index to. It reads a record at a time and prints the current offset in the file to the index, packed into a big-ending unsigned 32-bit integer; see the documentation for the pack function in perlfunc(1) for alternative storage types. Example 8-10. build_index
Once you have an index, it becomes easy to read a particular line from the data file. Jump to that record in the index, read the offset, and jump to that position in the data file. The next line you read will be the one you want. Example 8-11 returns the line, given the line number and the index and data file handles. Example 8-11. line_with_index
To use these subroutines, just say: open($fh, "<", $file) or die "Can't open $file for reading: $!\n"; open($index, "+>", $file.idx) or die "Can't open $file.idx for read/write: $!\n"; build_index($fh, $index); $line = line_with_index($file, $index, $seeking); The next step is to cache the index file between runs of the program, so you're not building it each time. This is shown in Example Recipe 8.12. Then add locking for concurrent access, and check time stamps on the files to see whether a change to the data file has made an old index file out of date. Example 8-12. cache_line_index
Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|