The biggest goof of all is forgetting to use the
-w
switch, which points out many errors. The second
biggest goof is not using
use strict
when it's appropriate.
Apart from those, there are certain traps that almost everyone falls into, and
other traps you'll fall into only if you come from a particular culture. We've
separated these out in the following sections.
-
Putting a comma after the filehandle in a
print
statement.
Although it looks extremely regular and pretty to say:
print STDOUT, "goodbye", $adj, "world!\n"; # WRONG
this is nonetheless incorrect, because of that first comma. What you
want instead is:
print STDOUT "goodbye", $adj, "world!\n"; # ok
The syntax is this way so that you can say:
print $filehandle "goodbye", $adj, "world!\n";
where
$filehandle
is a scalar holding the name of a filehandle at
run-time. This is distinct from:
print $notafilehandle, "goodbye", $adj, "world!\n";
where
$notafilehandle
is simply a string that is added to the list
of things to be printed. See Indirect Object in the glossary.
-
Using
==
instead of
eq
and
!=
instead of
ne
. The
==
and
!=
operators are
numeric
tests. The other two are
string
tests. The strings
"123"
and
"123.00"
are
equal as numbers, but not equal as strings. Also, any non-numeric
string is numerically equal to zero. Unless you are dealing with
numbers, you almost always want the string comparison operators
instead.
-
Forgetting the trailing semicolon. Every statement in
Perl is terminated by a semicolon or the end of a block. Newlines
aren't statement terminators as they are in
awk
or Python.
-
Forgetting that a
BLOCK
requires braces. Naked
statements are not
BLOCK
s. If you are creating a
control structure such as a
while
or an
if
that requires one or more
BLOCK
s, you
must
use braces
around each
BLOCK
.
-
Not saving
$1
,
$2
, and so on, across regular expressions.
Remember that every new
m/atch/
or
s/ubsti/tute/
will set (or clear, or mangle) your
$1
,
$2
... variables, as well as
$
`
,
$
'
, and
$&
. One way to save them right
away is to evaluate the match within a list context, as in:
($one,$two) = /(\w+) (\w+)/;
-
Not realizing that a
local
also changes
the variable's value within other subroutines called within the scope
of the local. It's easy to forget that
local
is a run-time statement that does dynamic
scoping, because there's no equivalent in languages like C. See
local
in
Chapter 3,
Functions
.
Usually you wanted a
my
anyway.
-
Losing track of brace pairings.
A good text editor will help you find the pairs. Get one.
-
Using loop control statements in
do {} while
.
Although the braces in this control structure look suspiciously
like part of a loop
BLOCK
, they aren't.
-
Saying
@foo[1]
when you mean
$foo[1]
.
The
@foo[1]
reference is an array
slice
, and means an
array consisting of the single element
$foo[1]
.
Sometimes, this doesn't make any difference, as in:
print "the answer is @foo[1]\n";
but it makes a big difference for things like:
@foo[1] = <STDIN>;
which will slurp up all the rest of
STDIN
,
assign the
first
line to
$foo[1]
, and discard everything else. This is probably not what you
intended. Get into the habit of thinking that
$
means a single
value, while
@
means a list of values, and you'll do okay.
-
Forgetting to select the right filehandle before setting
$^
,
$~
, or
$|
. These variables depend on the
currently selected filehandle, as determined by
select
(
FILEHANDLE
).
The initial filehandle so selected is
STDOUT
. You
should really be using the filehandle methods from the FileHandle
module instead. See
Chapter 7,
The Standard Perl Library
.
Practicing Perl Programmers should take note of the following:
-
Remember that many operations behave differently in a list context
than they do in a scalar one.
Chapter 3
has all the details.
-
Avoid barewords if you can, especially all lowercase ones.
You can't tell just by looking at it whether a word is
a function or a bareword string. By using quotes on strings and
parentheses around function call arguments, you won't ever get them confused.
In fact, the pragma
use strict
at the beginning of your program
makes barewords a compile-time error - probably a good thing.
-
You can't tell just by looking which built-in functions are unary
operators (like
chop
and
chdir
), which are list operators
(like
print
and
unlink
),
and which are argumentless (like
time
).
You'll want to learn them from
Chapter 2,
The Gory Details
. Note also
that user-defined subroutines are by default list operators, but can
be declared as unary operators with a prototype of
($)
.
-
People have a hard time remembering that some functions default to
$_
, or
@ARGV
, or whatever, while others do not. Take
the time to learn which are which, or avoid default arguments.
-
<
FH
>
is not the
name of a filehandle, but an angle operator that does a line-input
operation on the handle. This confusion usually manifests itself when
people try to
print
to the angle
operator:
print <FH> "hi"; # WRONG, omit angles
-
Remember also that data read by the angle operator is assigned to
$_
only when the file read is the sole
condition in a
while
loop:
while (<FH>) { }
while ($_ = <FH>) { }..
<FH>; # data discarded!
-
Remember not to use
=
when you need
=~
;
the two constructs are quite different:
$x = /foo/; # searches $_, puts result in $x
$x =~ /foo/; # searches $x, discards result
-
Use
my
for local variables whenever you can get away with
it (but see "Formats" in
Chapter 2
for where you can't).
Using
local
actually gives a local value to a global
variable, which leaves you open to unforeseen side effects
of dynamic scoping.
-
Don't localize a module's exported variables. If you localize an
exported variable, its exported value will not change. The local name
becomes an alias to a new value but the external name is still an alias
for the original.
Accustomed
awk
users should take special note of the following:
-
The English module, loaded via
use English;
allows you to refer to special variables (like
$RS
) using
their
awk
names; see the end of
Chapter 2
for details.
-
Semicolons are required after all simple statements in Perl (except
at the end of a block). Newline is not a statement delimiter.
-
Braces are required on
if
and
while
blocks.
-
Variables begin with
$
or
@
in Perl.
-
Arrays index from
0
, as do string positions in
substr
and
index
.
-
You have to decide whether your array has numeric or string indices.
-
You have to decide whether you want numeric or string comparisons.
-
Hash values do not spring into existence upon reference.
-
Reading an input line does not split it for you. You get to split it
yourself to an array. And the
split
operator has different
arguments than you might guess.
-
The current input line is normally in
$_
, not
$0
. It
generally does not have the newline stripped. (
$0
is the name of the program executed.) See
Chapter 2
.
-
$1
,
$2
, and so on, do not refer to fields - they
refer to substrings matched by the last pattern match.
-
The
print
operator
does not add field and record separators unless you set
$,
and
$\
.
(
$OFS
and
$ORS
if you're using
English.)
-
You must
open
your
files before you
print
to them.
-
The range operator is
..
rather than comma. The comma operator works (more or less) as in does C.
-
The match binding operator is
=~
, not
~
.
(
~
is the 1's complement operator, as in C.)
-
The exponentiation operator is
**
, not
^
.
^
is the bitwise XOR operator, as in C. (You
know, one could get the feeling that
awk
is
basically incompatible with C.)
-
The concatenation operator is dot
(
.
), not "nothing". (Using "nothing" as an
operator would render
/pat/ /pat/
unparsable, since
the third slash would be interpreted as a division operator - the
tokener is in fact slightly context sensitive for operators like
/
,
?
, and
<
. And, in fact, a dot itself can be the
beginning of a number.)
-
The
next
,
exit
,
and
continue
keywords work differently.
-
The following variables work differently:
-
You cannot set
$RS
to a pattern, only a string.
-
When in doubt, run the
awk
construct through
a2p
and see what it
gives you.
Cerebral C programmers should take note of the following:
-
Curlies are required for
if
and
while
blocks.
-
You must use
elsif
rather than "else if" or "elif". Syntax like:
if (expression) {
block;
}
else if (another_expression) {
another_block;
}
is illegal. The
else
part is always a
block, and a naked
if
is not a block.
You mustn't expect Perl to be exactly the same as C. What you want
instead is:
if (expression) {
block;
}
elsif (another_expression) {
another_block;
}
Note also that "elif" is "file" spelled backward. Only
Algol-ers would want a keyword that was the same as another word spelled
backward.
-
The
break
and
continue
keywords from C become in
Perl
last
and
next
, respectively.
Unlike in C, these do
not
work within a
do { } while
construct.
-
There's no switch statement. (But it's easy to build one on the fly; see
"Bare Blocks and Case Structures" in
Chapter 2
.)
-
Variables begin with
$
,
@
, or
%
in Perl.
-
printf
does not implement the
*
format for interpolating field widths, but it's
trivial to use interpolation of double-quoted strings to achieve the
same effect.
-
Comments begin with
#
, not
/*
.
-
You can't take the address of anything, although a similar operator
in Perl is the backslash, which creates a reference.
-
ARGV
must be capitalized.
$ARGV[0]
is C's
argv[1]
, and C's
argv[0]
ends up in
$0
.
-
Functions such as
link
,
unlink
, and
rename
return true for success, not
0
.
-
Signal handlers deal with signal names, not numbers.
Seasoned
sed
programmers should take note of the
following:
-
Backreferences in substitutions use
$
rather than
\
.
-
The pattern matching metacharacters
(
,
)
, and
|
do not have backslashes in front. The corresponding literal
characters do.
-
The range operator in Perl is ... rather
than a comma.
Sharp shell programmers should take note of the following:
-
Variables are prefixed with
$
or
@
on the left side of
the assignment as well as the right. A shellish assignment like:
camel='dromedary'; # WRONG
won't be parsed the way you expect. You need:
$camel='dromedary'; # ok
-
The loop variable of a
foreach
also requires a
$
.
Although
csh
likes:
foreach hump (one two)
stuff_it $hump
end
in Perl this is written as:
foreach $hump ("one", "two") {
stuff_it($hump);
}
-
The backtick operator does variable interpretation without regard to
the presence of single quotes in the command.
-
The backtick operator does no translation of the return value.
In Perl, you have to trim the newline explicitly, like this:
chop($thishost = `hostname`);
-
Shells (especially
csh
) do several levels of substitution on each
command line. Perl does substitution only within certain constructs
such as double quotes, backticks, angle brackets, and search patterns.
-
Shells tend to interpret scripts a little bit at a time. Perl compiles
the entire program before executing it (except for
BEGIN
blocks,
which execute at compile time).
-
The arguments are available via
@ARGV
, not
$1
,
$2
, and so on.
-
The environment is not automatically made available as separate scalar
variables. But see the Env module.
Penitent Perl 4 (and Prior) Programmers should take note of the following
changes between Release 4 and Release 5 that might affect old scripts:
-
@
now always interpolates an array in double-quotish strings.
Some programs may now need to use backslash to protect any
@
that shouldn't interpolate.
-
Barewords that used to look like strings to Perl will now look like
subroutine calls if a subroutine by that name is defined before the
compiler sees them. For example:
sub SeeYa { die "Hasta la vista, baby!" }
$SIG{'QUIT'} = SeeYa;
In prior versions of Perl, that code would set the signal handler. Now, it
actually calls the function! You may use the
-w
switch to find such risky usage.
-
Symbols starting with "_" are no longer forced into package main, except
for
$_
itself (and
@_
, and so on).
-
Double-colon is now a valid package separator in an identifier. Thus,
the statement:
print "$a::$b::$c\n";
now parses
$a::
as the variable reference, where in
prior versions only the
$a
was considered to be the variable
reference. Similarly,
print "$var::abc::xyz\n";
is now interpreted as a single variable
$var::abc::xyz
,
whereas in prior versions, the variable
$var
would have been
followed by the constant text
::abc::xyz
.
-
s'$lhs'$rhs'
now does no interpolation on either side. It used to
interpolate
$lhs
but not
$rhs
.
-
The second and third arguments of
splice
are
now evaluated in scalar context (as documented) rather than list context.
-
These are now semantic errors because of precedence:
shift @list + 20; # now parses like shift(@list + 20), illegal!
$n = keys %map + 20; # now parses like keys(%map + 20), illegal!
Because if those were to work, then this couldn't:
sleep $dormancy + 20;
-
The precedence of assignment operators is now the same as the precedence
of assignment. Previous versions of Perl mistakenly gave them the
precedence of the associated operator. So you now must parenthesize
them in expressions like
/foo/ ? ($a += 2) : ($a -= 2);
Otherwise:
/foo/ ? $a += 2 : $a -= 2;
would be erroneously parsed as:
(/foo/ ? $a += 2 : $a) -= 2;
On the other hand,
$a += /foo/ ? 1 : 2;
now works as a C programmer would expect.
-
open FOO || die
is now incorrect. You need parentheses around
the filehandle, because
open
has the precedence of a list operator.
-
The elements of argument lists for formats are now evaluated in list
context. This means you can interpolate list values now.
-
You can't do a
goto
into a block that is optimized away. Darn.
-
It is no longer syntactically legal to use whitespace as the name
of a variable, or as a delimiter for any kind of quote construct.
Double darn.
-
The
caller
function now returns a false value in a scalar context
if there is no caller. This lets library modules determine whether
they're being required or run directly.
-
m//g
now attaches its state to the searched string rather than
the regular expression. See "Regular Expressions" in
Chapter 2
for
further details.
-
reverse
is no longer allowed as the name of
a
sort
subroutine.
-
taintperl
is no longer a separate executable.
There is now a
-T
switch to turn on tainting when it isn't turned on automatically.
-
Double-quoted strings may no longer end with an unescaped
$
or
@
.
-
The archaic
if
BLOCK BLOCK
syntax is no longer supported.
-
Negative array subscripts now count from the end of the array.
-
The comma operator in a scalar context is now guaranteed to give a
scalar context to its arguments.
-
The
**
operator now binds more tightly than unary minus.
It was documented to work this way before, but didn't.
-
Setting
$#array
lower now discards array elements immediately.
-
delete
is not guaranteed to return the deleted value for
tie
d arrays, since this capability may be onerous for some modules
to implement.
-
The construct
"this is $$x"
, which used to interpolate the pid at that
point, now tries to dereference
$x
.
$$
by itself still
works fine, however.
-
The meaning of
foreach
has changed slightly when it is iterating over a
list which is not an array. This used to assign the list to a
temporary array, but for efficiency it no longer does so. This means
that you'll now be iterating over the actual values, not over copies of
the values. Modifications to the loop variable can change the original
values. To retain prior Perl semantics you'd need to assign your list
explicitly to a temporary array and then iterate over that. For
example, you might need to change:
foreach $var (grep /x/, @list) { ... }
to:
foreach $var (my @tmp = grep /x/, @list) { ... }
Otherwise changing
$var
will clobber the values of
@list
. (This most often happens when you use
$_
for the
loop variable, and call subroutines in the loop that don't properly
localize
$_
.)
-
Some error messages will be different.
-
Some bugs may have been inadvertently removed.[
]
|