Filehandles and File Tests (Learning Perl, 3rd Edition)

11.1. What Is a Filehandle?

A filehandle is the name in a Perl program for an I/O connection between your Perl process and the outside world. That is, it's the name of a connection, not necessarily the name of a file.

Filehandles are named like other Perl identifiers (letters, digits, and underscores, but they can't start with a digit), but since they don't have any prefix character, they might be confused with present or future reserved words, as we saw with labels. Once again, as with labels, the recommendation from Larry is that you use all uppercase letters in the name of your filehandle -- not only will it stand out better, but it will also guarantee that your program won't fail when a future (lowercase) reserved word is introduced.

But there are also six special filehandle names that Perl already uses for its own purposes: STDIN, STDOUT, STDERR, DATA, ARGV, and ARGVOUT.[243] Although you may choose any filehandle name you'd like, you shouldn't choose one of those six unless you intend to use that one's special properties.[244]

[243]Some people hate typing in all-caps, even for a moment, and will try spelling these in lowercase, like stdin. Perl may even let you get away with that from time to time, but not always. The details of when these work and when they fail are beyond the scope of this book. But the important thing is that programs that rely upon this kindness will one day break, so it is best to avoid lowercase here.

[244]In some cases, you could (re-)use these names without a problem. But your maintenance programmer may think that you're using the name for its builtin features, and thus may be confused.

Maybe you recognized some of those names already. When your program starts, STDIN is the filehandle naming the connection between the Perl process and wherever the program should get its input, known as the standard input stream. This is generally the user's keyboard unless the user asked for something else to be the source of input, such as reading the input from a file or reading the output of another program through a pipe.[245]

[245]The defaults we speak of in this chapter for the three main I/O streams are what the Unix shells do by default. But it's not just shells that launch programs, of course. We'll see in Chapter 14, "Process Management" what happens when you launch another program from Perl.

There's also the standard output stream, which is STDOUT. By default, this one goes to the user's display screen, but the user may send the output to a file or to another program, as we'll see shortly. These standard streams come to us from the Unix "standard I/O" library, but they work in much the same way on most modern operating systems.[246] The general idea is that your program should blindly read from STDIN and blindly write to STDOUT, trusting in the user (or generally whichever program is starting your program) to have set those up. In that way, the user can type a command like this one at the shell prompt:

[246]If you're not already familiar with how your non-Unix system provides standard input and output, see the perlport manpage and the documentation for that system's equivalent to the Unix shell (the program that runs programs based upon your keyboard input).

$ ./your_program <dino >wilma

That command tells the shell that the program's input should be read from the file dino, and the output should go to the file wilma. As long as the program blindly reads its input from STDIN, processes it (in whatever way we need), and blindly writes its output to STDOUT, this will work just fine.

And at no extra charge, the program will work in a pipeline. This is another concept from Unix, which lets us write command lines like this one:

$ cat fred barney | sort | ./your_program | grep something | lpr

Now, if you're not familiar with these Unix commands, that's okay. This line says that the cat command should print out all of the lines of file fred followed by all of the lines of file barney. Then that output should be the input of the sort command, which sorts those lines and passes them on to your_program. After it has done its processing, your_program will send the data on to grep, which discards certain lines in the data, sending the others on to the lpr command, which should print everything that it gets on a printer. Whew!

But pipelines like that are common in Unix and many other systems today because they let you put together a powerful, complex command out of simple, standard building blocks.

There's one more standard I/O stream. If (in the previous example) your_program had to emit any warnings or other diagnostic messages, those shouldn't go down the pipeline. The grep command is set to discard anything that it hasn't specifically been told to look for, and so it will most likely discard the warnings. Even if it did keep the warnings, we probably don't want those to be passed downstream to the other programs in the pipeline. So that's why there's also the standard error stream: STDERR. Even if the standard output is going to another program or file, the errors will go to wherever the user desires. By default, the errors will generally go to the user's display screen,[247] but the user may send the errors to a file with a shell command like this one:

[247]Also, generally, errors aren't buffered. That means that if the standard error and standard output streams are both going to the same place (such as the monitor), the errors may appear earlier than the normal output. For example, if your program prints a line of ordinary text, then tries to divide by zero, the output may show the message about dividing by zero first, and the ordinary text second.

$ netstat | ./your_program 2>/tmp/my_errors

Chapter 11. Filehandles and File Tests

Contents:

11.1. What Is a Filehandle?