-
Use hashes instead of linear searches.
For example, instead of searching through
@keywords
to see if
$_
is a keyword, construct a hash with:
my %keywords;
for (@keywords) {
$keywords{$_}++;
}
Then, you can quickly tell if
$_
contains a keyword by testing
$keyword{$_}
for a non-zero value.
-
Avoid subscripting when a
foreach
or list operator will do. Subscripting
sometimes forces conversion from floating point to integer, and
there's often a better way to do it. Consider using
foreach
,
shift
,
and
splice
operations. Consider saying
use integer
.
-
Avoid
goto
.
It scans outward from your current location for the indicated label.
-
Avoid
printf
if
print
will work.
Quite apart from the extra overhead of
printf
, some
implementations have field length limitations that
print
gets
around.
-
Avoid
$&
,
$
`
,
and
$
'
.
Any occurrence in your program causes all matches to save the searched
string for possible future reference. (However, once you've blown it, it
doesn't hurt to have more of them.)
-
Avoid using
eval
on a string. An
eval
of a string (not of a
BLOCK
) forces recompilation every time through. The
Perl parser is pretty fast for a parser, but that's not saying much. Nowadays
there's almost always a better way to do what you want anyway. In particular,
any code that uses
eval
merely to construct
variable names is obsolete, since you can now do the same directly using
symbolic references:
${$pkg . '::' . $varname} = &{ "fix_" . $varname }($pkg);
-
Avoid string
eval
inside a loop.
Put the loop into the
eval
instead, to avoid redundant
recompilations of the code. See the
study
operator
in
Chapter 3
for an example of this.
-
Avoid run-time-compiled patterns. Use the
/
pattern
/o
(once only) pattern modifier to avoid pattern recompilation when the
pattern doesn't change over the life of the process.
For patterns that change
occasionally, you can use the fact that a null pattern refers back to
the previous pattern, like this:
"foundstring" =~ /$currentpattern/; # Dummy match (must succeed).
while (<>) {
print if //;
}
You can also use
eval
to recompile a subroutine that does the match (if
you only recompile occasionally).
-
Short-circuit alternation is often faster than the corresponding
regular expression. So:
print if /one-hump/ || /two/;
is likely to be faster than:
print if /one-hump|two/;
at least for certain values of one-hump and two.
This is because the optimizer likes to hoist certain simple matching
operations up into higher parts of the syntax tree and do very fast
matching with a Boyer-Moore algorithm. A complicated pattern defeats
this.
-
Reject common cases early with
next if
.
As with simple regular expressions, the optimizer likes this. And it just
makes sense to avoid unnecessary work. You can typically discard comment
lines and blank lines even before you do a
split
or
chop
:
while (<>) {
next if /^#/;
next if /^$/;
chop;
@piggies = split(/,/);
...
}
-
Avoid regular expressions with many quantifiers, or with big
{
m,n
}
numbers on parenthesized expressions. Such patterns can result in
exponentially slow backtracking behavior unless the quantified
subpatterns match on their first "pass".
-
Try to maximize the length of any non-optional literal strings in
regular expressions. This is counterintuitive, but longer patterns
often match faster than shorter patterns. That's because the
optimizer looks for constant strings and hands them off to a
Boyer-Moore search, which benefits from longer strings. Compile your
pattern with the
-Dr
debugging switch to see what
Perl thinks the longest literal string is.
-
Avoid expensive subroutine calls in tight loops.
There is overhead associated with calling subroutines, especially when
you pass lengthy parameter lists, or return lengthy values. In
increasing order of desperation, try passing values by reference,
passing values as dynamically scoped globals, inlining the subroutine,
or rewriting the whole loop in C.
-
Avoid
getc
for anything but single-character terminal I/O.
In fact, don't use it for that either. Use
sysread
.
-
Use
readdir
rather than
<*>
.
To get all the non-dot files within a directory, say something like:
opendir(DIR,".");
@files = sort grep(!/^\./, readdir(DIR));
closedir(DIR);
-
Avoid frequent
substr
on long strings.
-
Use
pack
and
unpack
instead of multiple
substr
invocations.
-
Use
substr
as an lvalue rather than
concatenating substrings. For example, to replace the fourth through sixth
characters of
$foo
with the contents of the variable
$bar
, don't do:
$foo = substr($foo,0,3) . $bar . substr($foo,6);
Instead, simply identify the part of the string to be replaced,
and assign into it, as in:
substr($foo,3,3) = $bar;
But be aware that if
$foo
is a huge string, and
$bar
isn't exactly
3
characters long, this can do a lot of copying too.
-
Use
s///
rather than concatenating substrings.
This is especially true if you can replace one constant with another of
the same size. This results in an in-place substitution.
-
Use modifiers and equivalent
and
and
or
, instead of
full-blown conditionals.
Statement modifiers and logical operators avoid the overhead of entering
and leaving a block. They can often be more readable too.
-
Use
$foo = $a || $b || $c
.
This is much faster (and shorter to say) than:
if ($a) {
$foo = $a;
}
elsif ($b) {
$foo = $b;
}
elsif ($c) {
$foo = $c;
}
Similarly, set default values with:
$pi ||= 3;
-
Group together any tests that want the same initial string.
When testing a string for various prefixes in anything resembling a
switch structure, put together all the
/^a/
patterns, all the
/^b/
patterns, and so on.
-
Don't test things you know won't match.
Use
last
or
elsif
to avoid falling through to the next
case in your switch statement.
-
Use special operators like
study
, logical string operations,
pack 'u'
and
unpack '%'
formats.
-
Beware of the tail wagging the dog.
Misstatements resembling
(<STDIN>)[0]
and
0
.. 2000000
can
cause Perl much unnecessary work. In accord with UNIX philosophy, Perl
gives you enough rope to hang yourself.
-
Factor operations out of loops. The Perl optimizer does not attempt to
remove invariant code from loops. It expects you to exercise some sense.
-
Slinging strings can be faster than slinging arrays.
-
Slinging arrays can be faster than slinging strings.
It all depends on whether you're going to reuse the strings or arrays,
and on which operations you're going to perform. Heavy modification of each
element implies that arrays will be better, and occasional modification of
some elements implies that strings will be better. But you just have to
try it and see.
-
my
variables are normally
faster than
local
variables.
-
Sorting on a manufactured key array may be faster than using a fancy sort
subroutine.
A given array value may participate in several sort comparisons, so if
the sort subroutine has to do much recalculation, it's better to
factor out that calculation to a separate pass before the actual sort.
-
tr/abc//d
is faster than
s/[abc]//g
.
-
print
with a comma separator may be faster than concatenating strings.
For example:
print $fullname{$name} . " has a new home directory " .
$home{$name} . "\n";
has to glue together the two hashes and the two
fixed strings before passing them to the low-level print routines, whereas:
print $fullname{$name}, " has a new home directory ",
$home{$name}, "\n";
doesn't. On the other hand, depending on the values and the architecture,
the concatenation may be faster. Try it.
-
Prefer
join("", ...)
to a series of concatenated strings.
Multiple concatenations may cause strings to be copied back and
forth multiple times. The
join
operator avoids this.
-
split
on a fixed string is generally faster than
split
on a
pattern.
That is, use
split(/ /,...)
rather than
split(/ +/,...)
if you know there will only be one space.
However, the patterns
/\s+/
,
/^/
and
/ /
are
specially optimized, as is the
split
on whitespace.
-
Pre-extending an array or string can save some time.
As strings and arrays grow, Perl extends them by allocating a new copy
with some room for growth and copying in the old value. Pre-extending a
string with the
x
operator or an array by setting
$#array
can prevent this occasional overhead, as well as minimize memory
fragmentation.
-
Don't
undef
long strings and arrays if they'll be reused for the
same purpose.
This helps prevent reallocation when the string or array must be re-extended.
-
Prefer
"\0" x 8192
over
unpack("x8192",())
.
-
system("mkdir...")
may be faster on multiple directories if
mkdir
(2) isn't available.
-
Avoid using
eof
if return values will already indicate it.
-
Cache entries from passwd and group (and so on) that are apt to be reused.
For example, to cache the return value from
gethostbyaddr
when
you are converting numeric addresses (like
198.112.208.11
) to names
(like "www.ora.com"), you can use something like:
sub numtoname {
local($_) = @_;
unless (defined $numtoname{$_}) {
local(@a) = gethostbyaddr(pack('C4', split(/\./)),2);
$numtoname{$_} = @a > 0 ? $a[0] : $_;
}
$numtoname{$_};
}
-
Avoid unnecessary system calls.
Operating system calls tend to be rather expensive. So for example,
don't call the
time
operator when a cached value of
$now
would do. Use the special
_
filehandle to avoid unnecessary
stat
(2) calls. On some systems, even a minimal system call may
execute a thousand instructions.
-
Avoid unnecessary
system
calls.
The
system
operator has to fork a subprocess and execute the
program you specify. Or worse, execute a shell to execute the program
you specify. This can easily execute a million instructions.
-
Worry about starting subprocesses, but only if they're frequent.
Starting a single
pwd
,
hostname
, or
find
process isn't
going to hurt you much - after all, a shell starts subprocesses all day
long. We do occasionally encourage the toolbox approach, believe it or not.
-
Keep track of your working directory yourself rather than calling
pwd
repeatedly.
(A package is provided in the standard library for this.
See the Cwd module in
Chapter 7
.)
-
Avoid shell metacharacters in commands - pass lists to
system
and
exec
where appropriate.
-
Set the sticky bit on the Perl interpreter on machines without demand paging.
chmod +t /usr/bin/perl
-
Using defaults doesn't make your program faster.