8. File Contents
Contents:
The most brilliant decision in all of Unix was the choice of a single character for the newline sequence. - Mike O'Dell, only half jokingly 8.0. IntroductionBefore the Unix Revolution, every kind of data source and destination was inherently different. Getting two programs merely to understand each other required heavy wizardry and the occasional sacrifice of a virgin stack of punch cards to an itinerant mainframe repairman. This computational Tower of Babel made programmers dream of quitting the field to take up a less painful hobby, like autoflagellation. These days, such cruel and unusual programming is largely behind us. Modern operating systems work hard to provide the illusion that I/O devices, network connections, process control information, other programs, the system console, and even users' terminals are all abstract streams of bytes called files . This lets you easily write programs that don't care where their input came from or where their output goes. Because programs read and write via byte streams of simple text, every program can communicate with every other program. It is difficult to overstate the power and elegance of this approach. No longer dependent upon troglodyte gnomes with secret tomes of JCL (or COM) incantations, users can now create custom tools from smaller ones by using simple command-line I/O redirection, pipelines, and backticks. Treating files as unstructured byte streams necessarily governs what you can do with them. You can read and write sequential, fixed-size blocks of data at any location in the file, increasing its size if you write past the current end. Perl uses the standard C I/O library to implement reading and writing of variable-length records like lines, paragraphs, and words. What can't you do to an unstructured file? Because you can't insert or delete bytes anywhere but at end of file, you can't change the length of, insert, or delete records. An exception is the last record, which you can delete by truncating the file to the end of the previous record. For other modifications, you need to use a temporary file or work with a copy of the file in memory. If you need to do this a lot, a database system may be a better solution than a raw file (see Chapter 14, Database Access ).
The most common files are text files, and the most common operations on text files are reading and writing lines.
Use
The while (defined ($line = <DATAFILE>)) { chomp $line; $size = length $line; print "$size\n"; # output size of line }
Because this is a common operation and that's a lot to type, Perl gives it a shorthand notation. This shorthand reads lines into while (<DATAFILE>) { chomp; print length, "\n"; # output size of line }
Call @lines = <DATAFILE>;
Each time
Another special variable is undef $/; $whole_file = <FILE>; # 'slurp' mode
The
-0
option to Perl lets you set % perl -040 -e '$word = <>; print "First word is $word\n";'
The digits after
-0
are the octal value of the single character that % perl -ne 'BEGIN { $/="%%\n" } chomp; print if /Unix/i' fortune.dat
Use print HANDLE "One", "two", "three"; # "Onetwothree" print "Baa baa black sheep.\n"; # Sent to default output handle
There is no comma between the filehandle and the data to print. If you put a comma in there, Perl gives the error message
All systems use the virtual
Use the $rv = read(HANDLE, $buffer, 4096) or die "Couldn't read from HANDLE : $!\n"; # $rv is the number of bytes read, # $buffer holds the data read
The truncate(HANDLE, $length) or die "Couldn't truncate: $!\n"; truncate("/tmp/$$.pid", $length) or die "Couldn't truncate: $!\n";
Each filehandle keeps track of where it is in the file. Reads and writes occur from this point, unless you've specified the $pos = tell(DATAFILE); print "I'm $pos bytes from the start of DATAFILE.\n";
The seek(LOGFILE, 0, 2) or die "Couldn't seek to the end: $!\n"; seek(DATAFILE, $pos, 0) or die "Couldn't seek to $pos: $!\n"; seek(OUT, -20, 1) or die "Couldn't seek back 20 bytes: $!\n";
So far we've been describing buffered I/O. That is,
The $written = syswrite(DATAFILE, $mystring, length($mystring)); die "syswrite failed: $!\n" unless $written == length($mystring); $read = sysread(INFILE, $block, 256, 5); warn "only read $read bytes, not 256" if 256 != $read;
The
The $pos = sysseek(HANDLE, 0, 1); # don't change position die "Couldn't sysseek: $!\n" unless defined $pos; These are the basic operations available to you. The art and craft of programming lies in using these basic operations to solve complex problems like finding the number of lines in a file, reversing the order of lines in a file, randomly selecting a line from a file, building an index for a file, and so on. Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|