Answers to Chapter 11 Exercises (Learning Perl, 3rd Edition)

A.10. Answers to Chapter 11 Exercises

Here's one way to do it:
```
sub get_line { 
  # prompts for, reads, chomps, and returns a line of input
  print $_[0];
  chomp(my $line = <STDIN>);
  $line;
}

my $source = &get_line("Which source file? ");
open IN, $source
  or die "Can't open '$source' for input: $!";

my $dest = &get_line("What destination file? ");
die "Won't overwrite existing file"  
  if -e $dest;  # optional safety test
open OUT, ">$dest"
  or die "Can't open '$dest' for output: $!";

my $pattern = &get_line("What search pattern: ");
my $replace = &get_line("What replacement string: ");

while (<IN>) {
  s/$pattern/$replace/g;
  print OUT $_;
}
```
This one needs to ask the user for several things, so we decided to make a subroutine to take care of some of the work. The subroutine prints out the prompt, which is the first (and only) parameter to the subroutine. Then it reads a line of input, chomps it, and returns it. That makes it easy to ask for each parameter, one after the other.

Once we know what the user wants for the source file, we try opening it. An earlier version of this program asked for all of the parameters first, but if the source file name is incorrect, there's no point in having the user type more parameters. This way can save the user some time and trouble. Note that the die message reports the file name inside quote marks; this can be helpful in diagnosing a problem when a string has whitespace characters. If you opened "<$source" instead of just plain $source, that's fine, too. (There's no reason to worry that the user of this program will do something nefarious, since anything they can do with this program, they could accomplish just as well without it. If this program were made to run over the web, to give one example, we'd need to be much more cautious about opening the user's choice of file.)

As we hope you discovered when you tried it, it's easy to overwrite an existing file simply by opening it for output. So we put in a safety test using the -e file test. The corresponding die message doesn't include $! because we're not reporting a failed request of the system. By the way, this test for overwrite is fine here, but it would be insufficient in an environment where many copies of the same program (or different programs all working with the same files) might be running at once. This typically happens with programs on the web: Two processes check the same filename for existence at approximately the same time, and both see that it doesn't exist. So one of them creates the file, and an instant later the other one overwrites it with a file of its own. This kind of concurrency problem can't be solved with the -e file test; some kind of file locking (which is beyond the scope of this book) is needed.

With that safety test, the user won't accidentally overwrite an existing file. Is that test a good idea? Well, if the user comes to see you next week and says, "Golly, I'm glad you put in that safety test. It kept me from accidentally overwriting my file!", then you know that it was the right thing to do. But if the user says, "Dagnabit, your program is hard to use! I told it what filename I wanted to use for output, and it wouldn't let me use it until I first deleted that file!", then it was the wrong thing to do. Making decisions like this is often the toughest part of a programmer's job. Perhaps we should make the program ask something like, "Are you sure you want to overwrite the existing file `barney'?" by default, but have a command-line option for the power user that says to overwrite without asking. Next version.

Once we've asked for everything and opened the files, the rest of the program is pretty simple. The heart of the program is the loop at the end, which reads lines, updates them, and prints them out. Note that the substitution uses the /g option -- if you left that out, your program is broken, since the exercise asked that the program replace every occurrence of the search pattern, not just the first one on each line.

Were you able to use regular expression metacharacters in the search pattern? Sure; the substitution interpolates $pattern to make the search pattern. Were you able to use memory variables and backslash escapes in the replacement string? Nope; $replace is interpolated to make the replacement string, but it's not re-interpolated to interpret any magical characters. If $replace holds $1, that's a dollar sign and a numeral one in the replacement string. If Perl always kept re-interpolating, you could never put a dollar sign or backslash into the replacement string, since they'd always make something magical happen. (Actually, though, if you need one additional level of interpolation, it is possible. See the perlfaq manpages for some suggestions on how to do this.)
Here's one way to do it:
```
foreach my $file (@ARGV) {
  my $attribs = &attributes($file);
  print "'$file' $attribs.\n";
}

sub attributes {
  # report the attributes of a given file
  my $file = shift @_;
  return "does not exist" unless -e $file;

  my @attrib;
  push @attrib, "readable" if -r $file;
  push @attrib, "writable" if -w $file;
  push @attrib, "executable" if -x $file;
  return "exists" unless @attrib;
  'is ' . join " and ", @attrib;  # return value
}
```
In this one, once again it's convenient to use a subroutine. The main loop prints one line of attributes for each file, perhaps telling us that 'cereal-killer' is executable or that 'sasquatch' does not exist.

The subroutine tells us the attributes of the given filename. Of course, if the file doesn't even exist, there's no need for the other tests, so we test for that first. If there's no file, we'll return early.

If the file does exist, we'll build a list of attributes. (Give yourself extra credit points if you used the special _ filehandle instead of $file on these tests, to keep from calling the system separately for each new attribute.) It would be easy to add additional tests like the three we show here. But what happens if none of the attributes is true? Well, if we can't say anything else, at least we can say that the file exists, so we do. The unless clause uses the fact that @attrib will be true (in a Boolean context, which is a special case of a scalar context) if it's got any elements.

But if we've got some attributes, we'll join them with " and " and put "is " in front, to make a description like is readable and writable. This isn't perfect however; if there are three attributes, it says that the file is readable and writable and executable, which has too many ands, but we can get away with it. If you wanted to add more attributes to the ones this program checks for, you should probably fix it to say something like is readable, writable, executable, and nonempty. If that matters to you.

Note that if you somehow didn't put any filenames on the command line, this produces no output. This makes sense; if you ask for information on zero files, you should get zero lines of output. But let's compare that to what the next program does in a similar case, in the discussion below.
Here's one way to do it:
```
die "No file names supplied!\n" unless @ARGV;
my $oldest_name = shift @ARGV;
my $oldest_age = -M $oldest_name;

foreach (@ARGV) {
  my $age = -M;
  ($oldest_name, $oldest_age) = ($_, $age)    
    if $age > $oldest_age;
}

printf "The oldest file was %s, and it was %.1f days old.\n",
  $oldest_name, $oldest_age;
```
This one starts right out by complaining if it didn't get any filenames on the command line. That's because it's supposed to tell us the oldest filename -- and there ain't one if there aren't any files to check.

Once again, we're using the "high-water-mark" algorithm. The first file is certainly the oldest one seen so far. We have to keep track of its age as well, so that's in $oldest_age.

For each of the remaining files, we'll determine the age with the -M file test, just as we did for the first one (except that here, we'll use the default argument of $_ for the file test). The last-modified time is generally what people mean by the "age" of a file, although you could make a case for using a different one. If the age is more than $oldest_age, we'll use a list assignment to update both the name and age. We didn't have to use a list assignment, but it's a convenient way to update several variables at once.

We stored the age from -M into the temporary variable $age. What would have happened if we had simply used -M each time, rather than using a variable? Well, first, unless we used the special _ filehandle, we would have been asking the operating system for the age of the file each time, a potentially slow operation (not that you'd notice unless you have hundreds or thousands of files, and maybe not even then). More importantly, though, we should consider what would happen if someone updated a file while we're checking it. That is, first we see the age of some file, and it's the oldest one seen so far. But before we can get back to use -M a second time, someone modifies the file and resets the timestamp to the current time. Now the age that we save into $oldest_age is actually the youngest age possible. The result would be that we'd get the oldest file among the files tested from that point on, rather than the oldest overall; this would be a tough problem to debug!

Finally, at the end of the program, we use printf to print out the name and age, with the age rounded off to the nearest tenth of a day. Give yourself extra credit if you went to the trouble to convert the age to a number of days, hours, and minutes.