|
Chapter 2 The Gory Details |
|
The following names have special meaning to Perl. Most of the
punctuational names have reasonable mnemonics, or analogs in one of
the shells. Nevertheless, if you wish to use the long variable names,
just say:
at the top of your program. This will alias all the short names to the long names in the current package. Some of them even have medium names,
generally borrowed from awk (1).
A few of these variables are considered read-only. This means that if
you try to assign to this variable, either directly, or indirectly through
a reference, you'll raise a run-time exception.
There are several variables that are associated with regular expressions
and pattern matching. Except for $* they are always local to the
current block, so you never need to mention them in a local. (And
$* is deprecated, so you never need to mention it at all.)
- $digit
-
Contains the text matched by the corresponding set of parentheses in
the last pattern matched, not counting patterns matched in nested
blocks that have been exited already. (Mnemonic: like \digit.)
These variables are all read-only.
- $&
$MATCH
-
The string matched by the last successful pattern match, not counting any
matches hidden within a block or eval enclosed by the
current block. (Mnemonic: like & in some editors.) This
variable is read-only.
- $`
$PREMATCH
-
The string preceding whatever was matched by the last successful pattern
match not counting any matches hidden within a block or eval
enclosed by the current block. (Mnemonic: ` often precedes a
quoted string.) This variable is read-only.
- $'
$POSTMATCH
-
The string following whatever was matched by the last successful pattern
match not counting any matches hidden within a block or eval
enclosed by the current block.
(Mnemonic: ' often follows a quoted
string.) Example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
This variable is read-only.
- $+
$LAST_PAREN_MATCH
-
The last bracket matched by the last search pattern. This is useful if you
don't know which of a set of alternative patterns matched. For example:
/Version: (.*)|Revision: (.*)/ && ($rev = $+);
(Mnemonic: be positive and forward looking.) This variable is read-only.
- $*
$MULTILINE_MATCHING
-
Use of $* is now deprecated, and is allowed only for maintaining backwards
compatibility with older versions of Perl. Use /m (and maybe /s)
in the regular expression match instead.
Set to 1 to do multi-line matching within a string, 0 to tell Perl that it
can assume that strings contain a single line for the purpose of
optimizing pattern matches. Pattern matches on strings containing multiple
newlines can produce confusing results when $* is 0. Default is 0.
(Mnemonic: * matches multiple things.) Note that this variable only
influences the interpretation of ^ and $. A literal
newline can be searched for even when $* == 0.
These variables never need to be mentioned in a local
because they always refer to some value
pertaining to the currently selected output filehandle--each
filehandle keeps its own set of values. When you select another filehandle, the old filehandle
keeps whatever values it had in effect, and the variables now reflect
the values of the new filehandle.
To go a step further and avoid select
entirely, these variables that depend on the currently selected
filehandle may instead be set by calling an object method on the
FileHandle object. (Summary lines below for this contain the word
HANDLE.) First you must say:
after which you may use either:
or:
Each of the methods returns the old value of the FileHandle attribute.
The methods each take an optional EXPR, which if supplied specifies the
new value for the FileHandle attribute in question. If not supplied,
most of the methods do nothing to the current value, except for
autoflush, which will assume a 1 for you, just to be different.
- $|
$OUTPUT_AUTOFLUSH autoflush HANDLE EXPR
-
If set to nonzero, forces an fflush (3) after every write or
print on the currently selected output channel. (This is called
"command buffering". Contrary to popular belief, setting this variable
does not turn off buffering.) Default is 0, which on many systems
means that STDOUT will default to being line buffered if output is to
the terminal, and block buffered otherwise. Setting this variable is
useful primarily when you are outputting to a pipe, such as when you are
running a Perl script under rsh and want to see the output as it's
happening. This has no effect on input buffering. If you have a need to
flush a buffer immediately after setting $|,
you may simply print ""; rather than waiting for the
next print to flush it. (Mnemonic: when you
want your pipes to be piping hot.)
- $%
$FORMAT_PAGE_NUMBER format_page_number HANDLE EXPR
-
The current page number of the currently selected output channel.
(Mnemonic: % is page number in nroff.)
- $=
$FORMAT_LINES_PER_PAGE format_lines_per_page HANDLE EXPR
-
The current page length (printable lines) of the currently selected output
channel. Default is 60. (Mnemonic: = has horizontal lines.)
- $-
$FORMAT_LINES_LEFT format_lines_left HANDLE EXPR
-
The number of lines left on the page of the currently selected output
channel. (Mnemonic: lines_on_page - lines_printed.)
- $~
$FORMAT_NAME format_name HANDLE EXPR
-
The name of the current report format for the currently selected output
channel. Default is name of the filehandle. (Mnemonic: takes a turn after
$^.)
- $^
$FORMAT_TOP_NAME format_top_name HANDLE EXPR
-
The name of the current top-of-page format for the currently selected
output channel. Default is name of the filehandle with _TOP appended.
(Mnemonic: points to top of page.)
There are quite a few variables that are global in the fullest
sense--they mean the same thing in every package. If you want a
private copy of one of these, you must localize it in the current
block.
- $_
$ARG
-
The default input and pattern-searching space. These pairs are
equivalent:
while (<>) {...} # only equivalent in while!
while (defined($_ = <>)) {...}
/^Subject:/
$_ =~ /^Subject:/
tr/a-z/A-Z/
$_ =~ tr/a-z/A-Z/
chop
chop($_)
Here are the places where Perl will assume $_ even if you don't use
it:
- Various unary functions, including functions like ord and
int, as well as the all file tests (-f, -d) except for
-t, which defaults to STDIN.
- Various list functions like print and unlink.
- The pattern-matching operations m//, s///, and tr///
when used without an =~ operator.
- The default iterator variable in a foreach loop if no other
variable is supplied.
- The implicit iterator variable in the grep and map
functions.
- The default place to put an input record when a <FH> operation's
result is tested by itself as the sole criterion of a while test.
Note that outside of a while test, this
will not happen.
Mnemonic: underline is the underlying operand in certain operations.
- $.
$INPUT_LINE_NUMBER $NR
-
The current input line number of the last filehandle that was read. An
explicit close on the filehandle resets the line number. Since <>
never does an explicit close, line numbers increase across
ARGV files (but see examples under eof in Chapter 3, Functions). Localizing
$. has the effect of also localizing Perl's notion of the last read
filehandle. (Mnemonic: many programs use "." to mean the current line
number.)
- $/
$INPUT_RECORD_SEPARATOR $RS
-
The input record separator, newline by default. It works like awk 's
RS variable, and, if set to the null string, treats blank lines as
delimiters. You may set it to a multi-character string to match a
multi-character delimiter. Note that setting it to "\n\n" means
something slightly different than setting it to "", if the file
contains consecutive blank lines. Setting it to "" will treat two or
more consecutive blank lines as a single blank line. Setting it to
"\n\n" means Perl will blindly assume that the next input character belongs to
the next paragraph, even if it's a third newline. (Mnemonic: / is used to
delimit line boundaries when quoting poetry.)
undef $/;
$_ = <FH>; # whole file now here
s/\n[ \t]+/ /g;
- $,
$OUTPUT_FIELD_SEPARATOR $OFS
-
The output field separator for the print operator. Ordinarily the print
operator simply prints out the comma separated fields you specify. In
order to get behavior more like awk, set this variable as you would
set awk 's OFS variable to specify what is printed between
fields. (Mnemonic: what is printed when there is a "," in your print
statement.)
- $\
$OUTPUT_RECORD_SEPARATOR $ORS
-
The output record separator for the print operator. Ordinarily the
print operator simply prints out the comma-separated fields you
specify, with no trailing newline or record separator assumed. In
order to get behavior more like awk, set this variable as you would
set awk 's ORS variable to specify what is printed at the end
of the print. (Mnemonic: you set $\ instead of adding "\n" at the
end of the print. Also, it's just like /, but it's what you get "back"
from Perl.)
- $`
$LIST_SEPARATOR
-
This is like $, above except that it applies to list values interpolated
into a double-quoted string (or similar interpreted string). Default
is a space. (Mnemonic: obvious, I think.)
- $;
$SUBSCRIPT_SEPARATOR $SUBSEP
-
The subscript separator for multi-dimensional array emulation. If you
refer to a hash element as:
it really means:
$foo{join($;, $a, $b, $c)}
But don't put:
@foo{$a,$b,$c} # a slice--note the @
which means:
($foo{$a},$foo{$b},$foo{$c})
Default is "\034", the same as SUBSEP in awk. Note that if your
keys contain binary data there might not be any safe value for $;.
(Mnemonic: comma--the syntactic subscript separator--is a
semi-semicolon. Yeah, I know, it's pretty lame, but $, is already
taken for something more important.)
This variable is for maintaining backward compatibility, so consider using
"real" multi-dimensional arrays now.
- $^L
$FORMAT_FORMFEED format_formfeed HANDLE EXPR
-
What a format outputs to perform a formfeed. Default is `\f`.
- $:
$FORMAT_LINE_BREAK_CHARACTERS format_line_break_characters HANDLE EXPR
-
The current set of characters after which a string may be broken to fill
continuation fields (starting with ^) in a format. Default is ` \n-`, to break on whitespace or hyphens. (Mnemonic:
a colon in poetry is a part of a line.)
- $^A
$ACCUMULATOR
-
The current value of the write
accumulator for format lines. A format
contains formline commands that put
their result into $^A. After calling
its format, write prints out the
contents of $^A and empties. So you
never actually see the contents of $^A
unless you call formline yourself and
then look at it.
- $#
$OFMT
-
Use of $# is now deprecated and is allowed only for maintaining backwards
compatibility with older versions of Perl. You should use printf instead. $# contains the output format for printed numbers. This variable is a half-hearted
attempt to emulate awk 's OFMT variable. There are times, however,
when awk and Perl have differing notions of what is in fact numeric.
Also, the initial value is approximately %.14g rather than %.6g, so you
need to set $# explicitly to get awk 's value. (Mnemonic: # is the
number sign. Better yet, just forget it.)
- $?
$CHILD_ERROR
-
The status returned by the last pipe close, backtick (``) command,
or system operator. Note that this is the status word returned by
the wait (2) system call, so the exit value of the subprocess is actually
($? >> 8). Thus on many systems, ($? & 255) gives which signal,
if any, the process died from, and whether there was a core dump.
(Mnemonic: similar to sh and ksh.)
- $!
$OS_ERROR $ERRNO
-
If used in a numeric context, yields the current value of the
errno variable (identifying the last system call error) in the
currently executing perl, with
all the usual caveats. (This means that you shouldn't depend on the value
of $! to be anything in particular unless you've gotten a specific
error return indicating a system error.) If used in a string context,
yields the corresponding system error string. You can assign to $!
in order to set errno, if, for instance, you want $! to return
the string for error n, or you want to set the exit value for the
die operator. (Mnemonic: What just went bang?)
- $@
$EVAL_ERROR
-
The Perl syntax error message from the last eval command. If null,
the last eval was parsed and executed correctly (although the operations
you invoked may have failed in the normal fashion). (Mnemonic: Where was
the syntax error "at"?)
Note that warning messages are not collected in this variable. You can,
however, set up a routine to process warnings by setting
$SIG{_ _WARN_ _} below.
- $$
$PROCESS_ID $PID
-
The process number of the Perl running this script. (Mnemonic: same
as shells.)
- $<
$REAL_USER_ID $UID
-
The real user ID (uid) of this process. (Mnemonic: it's the uid you came
from, if you're running setuid.)
- $>
$EFFECTIVE_USER_ID $EUID
-
The effective uid of this process. Example:
$< = $>; # set real to effective uid
($<,$>) = ($>,$<); # swap real and effective uid
(Mnemonic: it's the uid you went to, if you're running
setuid.) Note: $< and $> can only be swapped on machines
supporting setreuid (2). And sometimes not even then.
- $(
$REAL_GROUP_ID $GID
-
The real group ID (gid) of this process. If you are on a machine that supports
membership in multiple groups simultaneously, gives a space-separated
list of groups you are in. The first number is the one returned by
getgid (1), and the subsequent ones by getgroups(2), one of which
may be the same as the first number. (Mnemonic: parentheses are used to
group things. The real gid is the group you
left, if you're running setgid.)
- $)
$EFFECTIVE_GROUP_ID $EGID
-
The effective gid of this process. If you are on a machine that
supports membership in multiple groups simultaneously, $) gives a
space-separated list of groups you are in. The first number is the
one returned by getegid (2), and the subsequent
ones by getgroups (2), one of which may be the
same as the first number. (Mnemonic: parentheses are used to
group things. The effective gid is the group
that's right for you, if you're running setgid.)
Note: $<, $>, $(, and
$) can only be set on machines that
support the corresponding system set-id routine. $( and $) can only
be swapped on machines supporting setregid(2).
Because Perl doesn't currently use initgroups(2),
you can't set your group vector to multiple groups.
- $0
$PROGRAM_NAME
-
Contains the name of the file containing the Perl script being executed.
Assigning to $0 attempts to modify the argument area that the
ps (1) program sees. This is more useful as a way of indicating the
current program state than it is for hiding the program you're running.
But it doesn't work on all systems. (Mnemonic: same as sh and
ksh.)
- $[
-
The index of the first element in an array, and of the first character in
a substring. Default is 0, but you could set it to 1 to make Perl
behave more like awk (or FORTRAN) when
subscripting and when evaluating the index and substr
functions. (Mnemonic: [ begins subscripts.)
Assignment to $[ is now treated as a compiler directive, and cannot
influence the behavior of any other file. Its use is discouraged.
- $]
$PERL_VERSION
-
Returns the version + patchlevel / 1000. It can be used to determine at
the beginning of a script whether the Perl interpreter executing the script
is in the right range of versions. Example:
warn "No checksumming!\n" if $] < 3.019;
die "Must have prototyping available\n" if $] < 5.003;
(Mnemonic: Is this version of Perl in the right bracket?)
- $^D
$DEBUGGING
-
The current value of the debugging flags. (Mnemonic: value of -D
switch.)
- $^F
$SYSTEM_FD_MAX
-
The maximum system file descriptor, ordinarily 2. System file
descriptors are passed to exec ed
processes, while higher file descriptors are not. Also, during an
open, system file descriptors are
preserved even if the open fails.
(Ordinary file descriptors are closed before the open is attempted, and stay closed if the
open fails.) Note that the
close-on-exec status of a file descriptor will be decided according to
the value of $^F at the time of the
open, not the time of the exec.
- $^H
-
This variable contains internal compiler hints enabled by certain
pragmatic modules. Hint: ignore this and use the pragmata.
- $^I
$INPLACE_EDIT
-
The current value of the inplace-edit extension. Use undef to disable
inplace editing. (Mnemonic: value of -i switch.)
- $^O
$OSNAME
-
This variable contains the name of the operating system the current
Perl binary was compiled for. It's intended as a cheap alternative
to pulling it out of the Config module.
- $^P
$PERLDB
-
The internal flag that the debugger clears so that it doesn't debug
itself. You could conceivably disable debugging yourself by clearing
it.
- $^T
$BASETIME
-
The time at which the script began running, in seconds since the epoch
(the beginning of 1970, for UNIX systems). The values returned by the
-M, -A, and -C filetests are based on this value.
- $^W
$WARNING
-
The current value of the warning switch, either true or
false. (Mnemonic: the value is related to the -w switch.)
- $^X
$EXECUTABLE_NAME
-
The name that the Perl binary itself was executed as, from C's argv[0].
- $ARGV
-
Contains the name of the current file when reading from <ARGV>.
The following arrays and hashes are global. Just like the special global
scalar variables, they refer to package main no matter when they are
referenced. The following two statements are exactly the same:
print "@INC\n";
print "@main::INC\n";
- @ARGV
-
The array containing the command-line arguments intended for the
script. Note that $#ARGV is generally the number of arguments minus
one, since $ARGV[0] is the first argument, not the
command name. See $0 for the command name.
- @INC
-
The array containing the list of places to look for Perl scripts
to be evaluated by the do EXPR, require, or use
constructs. It initially consists of the arguments to any -I
command-line switches, followed by the default Perl libraries, such as:
/usr/local/lib/perl5/$ARCH/$VERSION
/usr/local/lib/perl5
/usr/local/lib/perl5/site_perl
/usr/local/lib/perl5/site_perl/$ARCH
followed by ".", to represent the
current directory. If you need to modify this list at run-time, you should use
the lib module in order to also get the machine-dependent library
properly loaded:
use lib '/mypath/libdir/';
use SomeMod;
- @F
-
The array into which the input lines are split when the -a
command-line switch is given. If the -a option is not used, this
array has no special meaning. (This array is actually only @main::F, and not
in all packages at once.)
- %INC
-
The hash containing entries for the filename of each file that has been
included via do or require. The key is the filename you
specified, and the value is the location of the file actually found. The
require command uses this array to determine whether a given file has
already been included.
- %ENV
-
The hash containing your current environment. Setting a value in %ENV
changes the environment for child processes:
$ENV{PATH} = "/bin:/usr/bin";
To remove something from your environment, make sure
to use delete instead of undef.
Note that processes running as a crontab entry
inherit a particularly impoverished set of environment variables.
Also note that you should set $ENV{PATH},
$ENV{SHELL}, and $ENV{IFS} if
you are running as a setuid script. See Chapter 8, Other Oddments,
for more on security and setuid issues.
- %SIG
-
The hash used to set signal handlers for various signals. Example:
sub handler { # 1st argument is signal name
local($sig) = @_;
print "Caught a SIG$sig--shutting down\n";
close(LOG);
exit(0);
}
$SIG{INT} = 'handler';
$SIG{QUIT} = 'handler';
...
$SIG{INT} = 'DEFAULT'; # restore default action
$SIG{QUIT} = 'IGNORE'; # ignore SIGQUIT
The %SIG array only contains values for the signals actually set
within the Perl script. Here are some other examples:
$SIG{PIPE} = Plumber; # SCARY!!
$SIG{PIPE} = "Plumber"; # just fine, assumes main::Plumber
$SIG{PIPE} = \&Plumber; # just fine; assume current Plumber
$SIG{PIPE} = Plumber(); # oops, what did Plumber() return??
The example marked SCARY!! is problematic because it's a bareword, which means
sometimes it's a string representing the function, and sometimes it's
going to call the subroutine right then and there! Best to be sure
and quote it or take a reference to it.
Certain internal hooks can also be set using the %SIG hash. The
routine indicated by $SIG{_ _WARN_ _} is called when a warning message
is about to be printed. The warning message is passed as the first
argument. The presence of a _ _WARN_ _ hook causes the ordinary
printing of warnings to STDERR to be suppressed. You can use this
to save warnings in a variable, or turn warnings into fatal errors, like
this:
local $SIG{_ _WARN_ _} = sub { die $_[0] };
eval $proggie;
The routine indicated by $SIG{_ _DIE_ _} is called
when a fatal exception is about to be thrown. The error message is
passed as the first argument. When a _ _DIE_ _ hook
routine returns, the exception processing continues as it would have
in the absence of the hook, unless the hook routine itself exits via a
goto, a loop exit, or a die. The _ _DIE_ _ handler is
explicitly disabled during the call, so that you yourself can then
call the real die from a
_ _DIE_ _ handler. (If it weren't disabled, the
handler would call itself recursively forever.) The case is similar for
_ _WARN_ _.
The following filehandles (except for DATA) always refer to
main::FILEHANDLE.
- ARGV
-
The special filehandle that iterates over command line filenames in
@ARGV. Usually written as the null filehandle in <>.
- STDERR
-
The special filehandle for standard error in any package.
- STDIN
-
The special filehandle for standard input in any package.
- STDOUT
-
The special filehandle for standard output in any package.
- DATA
-
The special filehandle that refers to anything following the
_ _END_ _ token in the file
containing the script. Or, the special filehandle for anything
following the _ _DATA_ _ token in a required file, as long as
you're reading data in the same package that the _ _DATA_ _ was
found in.
- _ (underline)
-
The special filehandle used to cache the information from the last stat,
lstat, or file test operator.
|