11.6. File Tests
Now you know how to open a filehandle for
output. Normally, that will create a new file, wiping out any
existing file with the same name. Perhaps you want to check that
there isn't a file by that name. Perhaps you need to know how
old a given file is. Or perhaps you want to go through a list of
files to find which ones are larger than a certain number of bytes
and not accessed for a certain amount of time. Perl has a complete
set of tests you can use to find out information about files.
Let's try that first example, where we need to check that a
given file doesn't exist, so that we don't accidentally
overwrite a vital spreadsheet data file, or that important birthday
calendar. For this, we need the -e file test,
testing for existence:
die "Oops! A file called '$filename' already exists.\n"
if -e $filename;
Notice that we don't include $! in this
die message, since we're not reporting
that the system refused a request in this case. Here's an
example of checking whether a file is being kept up-to-date.
Let's say that our program's configuration file should be
updated every week or two. (Maybe it's checking for computer
viruses, say.) If the file hasn't been modified in the past 28
days, then something is wrong:
warn "Config file is looking pretty old!\n"
if -M CONFIG > 28;
The third example is more complex. Here, let's say that disk
space is filling up and rather than buy more disks, we've
decided to move any large, useless files to the backup tapes. So
let's go through our list of files[264] to see which of them are larger than 100 K. But even if a
file is large, we shouldn't move it to the backup tapes unless
it hasn't been accessed in the last 90 days (so we know that
it's not used too often):[265]
my @original_files = qw/ fred barney betty wilma pebbles dino bamm-bamm /;
my @big_old_files; # The ones we want to put on backup tapes
foreach my $filename (@original_files) {
push @big_old_files, $_
if -s $filename > 100_000 and -A $filename > 90;
}
This is the first time that we've seen it, so maybe you noticed
that the control variable of the foreach loop is a
my variable. That declares it to have the scope of
the loop itself, so this example should work under use
strict. Without the my keyword, this
would be using the global $filename.
The file tests all look like a hyphen and a letter, which is the name
of the test, followed by either a filename or a filehandle to be
tested. Many of them return a true/false value, but several give
something more interesting. See Table 11-1 for the
complete list, and then read the following discussion to learn more
about the special cases.
Table 11-1. File tests and their meanings
File test
|
Meaning
|
-r
|
File or
directory is readable by this (effective) user or group
|
-w
|
File or
directory is writable by this (effective) user or group
|
-x
|
File or directory is executable by this (effective) user or group
|
-o
|
File or directory is owned by this (effective) user
|
-R
|
File or directory is readable by this real user or group
|
-W
|
File or directory is writable by this real user or group
|
-X
|
File or directory is executable by this real user or group
|
-O
|
File or directory is owned by this real user
|
-e
|
File or directory name exists
|
-z
|
File exists and has zero size (always false for directories)
|
-s
|
File or directory exists and has nonzero size (the value is the size
in bytes)
|
-f
|
Entry is a plain file
|
-d
|
Entry is a directory
|
-l
|
Entry is a symbolic link
|
-S
|
Entry is a socket
|
-p
|
Entry is a named pipe (a "fifo")
|
-b
|
Entry is a block-special file (like a mountable disk)
|
-c
|
Entry is a character-special file (like an I/O device)
|
-u
|
File or directory is setuid
|
-g
|
File or directory is setgid
|
-k
|
File or directory has the sticky bit set
|
-t
|
The filehandle is a TTY (as reported by the isatty(
) system function; filenames can't be tested by this
test)
|
-T
|
File looks like a "text" file
|
-B
|
File looks like a "binary" file
|
-M
|
Modification age (measured in days)
|
-A
|
Access age (measured in days)
|
-C
|
Inode-modification age (measured in days)
|
The tests -r, -w,
-x, and -o tell whether the
given attribute is true for the effective user or group ID,[266] which essentially refers to the person who is "in
charge of" running the program.[267] These tests look at
the "permission bits" on the file to see what is
permitted. If your system uses Access Control Lists (ACLs), the tests
will use those as well. These tests generally tell whether the system
would try to permit something, but it
doesn't mean that it really would be possible. For example,
-w may be true for a file on a CD-ROM, even though
you can't write to it, or -x may be true on
an empty file, which can't truly be executed.
The -s test does return true if the file is
nonempty, but it's a special kind of true. It's the
length of the file, measured in bytes, which evaluates as true for a
nonzero number.
On a Unix filesystem,[268] there are
just seven types of items, represented by the seven file tests
-f, -d, -l,
-S, -p, -b,
and -c. Any item should be one of those. But if
you have a symbolic link pointing to a file, that
will report true for both -f and
-l. So if you want to know whether something is a
symbolic link, you should generally test that first. (We'll
learn more about symbolic links in Chapter 13, "Manipulating Files and Directories".)
The age tests, -M, -A, and
-C (yes, they're uppercase), return the
number of days since the file was last modified, accessed, or had its
inode changed.[269] (The inode contains all of the information about the file
except for its contents -- see the stat system
call manpage or a good book on Unix internals for details.) This age
value is a full floating-point number, so you might get a value of
2.00001 if a file were modified two days and one
second ago. (These "days" aren't necessarily the
same as a human would count; for example, if it's one thirty in
the morning when you check a file modified at about an hour before
midnight, the value of -M for this file would be
around 0.1, even though it was modified
"yesterday.")
When checking the age of a file, you might even get a negative value
like -1.2, which means that the file's
last-access timestamp is set at about thirty hours in the future! The
zero point on this timescale is the moment your program started
running,[270] so that value might mean
that a long-running program was looking at a file that had just been
accessed. Or a timestamp could be set (accidentally or intentionally)
to a time in the future.
The tests -T and -B take a try
at telling whether a file is text or binary. But people who know a
lot about filesystems know that there's no bit (at least in
Unix-like operating systems) to indicate that a file is a binary or
text file -- so how can Perl tell? The answer is that Perl cheats:
it opens the file, looks at the first few thousand bytes, and makes
an educated guess. If it sees a lot of null bytes, unusual control
characters, and bytes with the high bit set, then that looks like a
binary file. If there's not much weird stuff then it looks like
text. As you might guess, it sometimes guesses wrong. If a text file
has a lot of Swedish or French words (which may have characters
represented with the high bit set, as some ISO-8859-something
variant, or perhaps even a Unicode version), it may fool Perl into
declaring it binary. So it's not perfect, but if you just need
to separate your source code from compiled files, or HTML files from
PNGs, these tests should do the trick.
You'd think that -T and
-B would always disagree, since a text file
isn't a binary and vice versa, but there are two special cases
where they're in complete agreement. If the file doesn't
exist, both are false, since it's neither a text file nor a
binary. Alternatively, if the file is empty, it's an empty text
file and an empty binary file at the same time, so they're both
true.
The -t file test returns true if the given
filehandle is a TTY -- in short, if it's able to be
interactive because it's not a simple file or pipe. When
-t STDIN returns true, it generally means that you
can interactively ask the user questions. If it's false, your
program is probably getting input from a file or pipe, rather than a
keyboard.
Don't worry if you don't know what some of the other file
tests mean -- if you've never heard of them, you won't
be needing them. But if you're curious, get a good book about
programming for Unix.
(On non-Unix systems, these tests all try to give results analogous
to what they do on Unix. Usually you'll be able to guess
correctly what they'll do.)
If you omit the filename or filehandle parameter to a file test (that
is, if you have just -r or just
-s, say), the default operand is the file named in
$_.[271] So, to test a list of
filenames to see which ones are readable, you simply type:
foreach (@lots_of_filenames) {
print "$_ is readable\n" if -r; # same as -r $_
}
But if you omit the parameter, be careful that whatever follows the
file test doesn't look like it could be a
parameter. For example, if you wanted to find out the size of a file
in K rather than in bytes, you might be tempted to divide the result
of -s by 1000 (or
1024), like this:
# The filename is in $_
my $size_in_K = -s / 1000; # Oops!
When the Perl parser sees the slash, it doesn't think about
division; since it's looking for the optional operand for
-s, it sees what looks like the start of a regular
expression in forward slashes. One simple way to prevent this kind of
confusion is to put
parentheses around the file test:
my $size_in_k = (-s) / 1024; # Uses $_ by default
Of course, it's always safe to explicitly give a file test a
parameter.
11.6.1. The stat and lstat Functions
While these file tests are fine for testing various attributes
regarding a particular file or filehandle, they don't tell the
whole story. For example, there's no file test that returns the
number of links to a
file or the owner's
user-ID (uid).
To get at the remaining information about a file, merely call the
stat function, which returns pretty much
everything that the stat Unix system call returns (hopefully more
than you want to know).[272]
The operand to stat is a filehandle, or an
expression that evaluates to a filename. The return value is either
the empty list, indicating that the stat failed
(usually because the file doesn't exist), or a 13-element list
of numbers, most easily described using the following list of
scalar variables:
my($dev, $ino, $mode, $nlink, $uid, $gid, $rdev,
$size, $atime, $mtime, $ctime, $blksize, $blocks)
= stat($filename);
The names here refer to the parts of the stat structure, described in
detail in the stat(2) manpage. You should probably look there
for the detailed descriptions. But in short, here's a quick
summary of the important ones:
- $dev and $ino
-
The device number and inode number of the file. Together they make up
a "license plate" for the file. Even if it has more than
one name (hard link), the combination of device and inode numbers
should always be unique.
- $mode
-
The set of permission bits for the file, and some other bits. If
you've ever used the Unix command ls -l to
get a detailed (long) file listing, you'll see that each line
of output starts with something like -rwxr-xr-x.
The nine letters and hyphens of file permissions[273] correspond to the nine least-significant bits of
$mode, which would in this case give the octal
number 0755. The other bits, beyond the lowest
nine, indicate other details about the file. So if you need to work
with the mode, you'll generally want to use the bitwise
operators covered later in this chapter.
- $nlink
-
The number of (hard) links to the file or directory. This is the
number of true names that the item has. This number is always
2 or more for directories and (usually)
1 for files. We'll see more about this when
we talk about creating links to files in Chapter 13, "Manipulating Files and Directories". In the listing from ls -l,
this is the number just after the permission-bits string.
- $uid
and
$gid
-
The numeric user-ID and group-ID showing the file's ownership.
- $size
-
The size in bytes, as returned by the -s file test.
- $atime
, $mtime, and $ctime
-
The three
timestamps, but
here they're represented in the system's timestamp
format: a 32-bit number telling how many seconds have passed since
the Epoch, an arbitrary starting point for
measuring system
time. On Unix systems and some others, the
Epoch is the beginning
of 1970 at midnight Universal Time, but the Epoch is different on
some machines. There's more information later in this chapter
on turning that timestamp number into something useful.
Invoking stat on the name of a
symbolic link
returns information on what the symbolic link points at, not
information about the symbolic link itself (unless the link just
happens to be pointing at nothing currently accessible). If you need
the (mostly useless) information about the symbolic link itself, use
lstat rather than stat
(which returns the same information in the same order). If the
operand isn't a symbolic link, lstat
returns the same things that stat would.
Like the file tests, the operand of stat or
lstat defaults to $_,
meaning that the underlying stat system call will be performed on the
file named by the scalar variable $_.
11.6.2. The localtime Function
When you have a
timestamp number (such as the ones
from stat), it will typically look something like
1080630098. That's not very useful for most humans, unless you
need to compare two timestamps by subtracting. You may need to
convert it to something human-readable, such as a string like
"Tue Mar 30 07:01:38 2004". Perl can
do that with the localtime function in a scalar
context:
my $timestamp = 1080630098;
my $date = localtime $timestamp;
In a list context, localtime returns a list of
numbers, several of which may not be quite what you'd expect:
my($sec, $min, $hour, $day, $mon, $year, $wday, $yday, $isdst)
= localtime $timestamp;
The $mon is a month number, ranging from
0 to 11, which is handy as an
index into an array of month names. The $year is
the number of years since 1900, oddly enough, so add
1900 to get the real year number. The
$wday ranges from 0 (for
Sunday) through 6 (for Saturday), and the
$yday is the day-of-the-year (ranging from 0 for
January 1, through 364 or 365 for December 31).
There are two related functions that you'll also find useful.
The gmtime function is just the same as
localtime, except that it returns the time in
Universal
Time (what we once called Greenwich Mean Time). If you need the
current timestamp number from the system clock, just use the
time function. Both
localtime and gmtime default to
using the current time value if you don't
supply a parameter:
my $now = gmtime; # Get the current universal timestamp as a string
For more information on manipulating date and time information, see
the information about some useful modules in Appendix B, "Beyond the Llama".
11.6.3. Bitwise Operators
When you need to work with numbers
bit-by-bit, as when working with the mode bits returned by
stat, you'll need to use the bitwise
operators. The bitwise-and operator (&)
reports which bits are set in the left argument
and in the right argument. For example, the
expression 10 & 12 has the value
8. The bitwise-and needs to have a one-bit in both
operands to produce a one-bit in the result. That means that the
logical-and operation on ten (which is 1010 in
binary) and twelve (which is 1100) gives eight
(which is 1000, with a one-bit only where the left
operand has a one-bit and the right operand also
has a one-bit). See Figure 11-1.
Figure 11-1. Bitwise-and addition
The different bitwise operators and their meanings are shown in this
table:
Expression
|
Meaning
|
10 & 12
|
Bitwise-and -- which bits are true in both operands (this gives
8)
|
10 | 12
|
Bitwise-or -- which bits are true in one operand or the other
(this gives 14)
|
10 ^ 12
|
Bitwise-xor -- which bits are true in one operand or the other but
not both (this gives 6)
|
6 << 2
|
Bitwise shift left -- shift the left operand the number of bits
shown by the right operand, adding zero-bits at the least-significant
places (this gives 24)
|
25 >> 2
|
Bitwise shift right -- shift the left operand the number of bits
shown by the right operand, discarding the least-significant bits
(this gives 6)
|
~ 10
|
Bitwise negation, also called unary bit complement -- return the
number with the opposite bit for each bit in the operand (this gives
0xFFFFFFF5, but see the text)
|
So, here's an example of some things you could do with the
$mode returned by stat. The
results of these bit manipulations could be useful with
chmod, which we'll see in Chapter 13, "Manipulating Files and Directories":
# $mode is the mode value returned from a stat of CONFIG
warn "Hey, the configuration file is world-writable!\n"
if $mode & 0002; # configuration security problem
my $classical_mode = 0777 & $mode; # mask off extra high-bits
my $u_plus_x = $classical_mode | 0100; # turn one bit on
my $go_minus_r = $classical_mode & (~ 0044); # turn two bits off
11.6.5. Using the Special Underscore Filehandle
Every time you use stat,
lstat, or a file test in a program, Perl has to go
out to the system to ask for a
stat
buffer on the file (that is, the return buffer from the stat system
call). That means if you want to know whether a file is both readable
and writable, you've essentially asked the system twice for the
same information (which isn't likely to change in a fairly
nonhostile environment).
This looks like a waste of time,[274] and in fact, it can be avoided. Doing a file test,
stat, or
lstat on the special
_
filehandle (that is, the operand is nothing but a single underscore)
tells Perl to use whatever happened to be lounging around in memory
from the previous file test, stat, or
lstat function, rather than going out to the
operating system again. Sometimes this is dangerous: a subroutine
call can invoke stat without your knowledge,
blowing your buffer away. But if you're careful, you can save
yourself a few unneeded system calls, thereby making your program
considerably faster. Here's that example of finding files to
put on the backup tapes again, using the new tricks we've
learned:
my @original_files = qw/ fred barney betty wilma pebbles dino bamm-bamm /;
my @big_old_files; # The ones we want to put on backup tapes
foreach (@original_files) {
push @big_old_files, $_
if (-s) > 100_000 and -A _ > 90; # More efficient than before
}
Note that we used the default of $_ for the first
test -- this is no more efficient (except perhaps for the
programmer), but it gets the data from the operating system. The
second test uses the magic _ filehandle; for this
test, the data left around after getting the file's size is
used, which is exactly what we want.
Note that testing the _ filehandle is not the same
as allowing the operand of a file test, stat, or
lstat to default to testing
$_; using $_ would be a fresh
test each time on the current file named by the contents of
$_, but using _ saves the
trouble of calling the system again. Here is another case where
similar names were chosen for radically different functions. By now,
you are probably used to it.
 |  |  | 11.5. Reopening a Standard Filehandle |  | 11.7. Exercises |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|