24.2. Efficiency
While most of the work of programming may be simply getting your program
working properly, you may find yourself wanting more bang for the buck
out of your Perl program. Perl's rich set of operators, data types, and
control constructs are not necessarily intuitive when it comes to speed
and space optimization. Many trade-offs were made during Perl's design,
and such decisions are buried in the guts of the code. In general, the
shorter and simpler your code is, the faster it runs, but there are
exceptions. This section attempts to help you make it work just a wee
bit better.
If you want it to work a lot better, you can play with the Perl
compiler backend described in Chapter 18, "Compiling", or rewrite your
inner loop as a C extension as illustrated in Chapter 21, "Internals and Externals".
Note that optimizing for time may sometimes cost you in space or
programmer efficiency (indicated by conflicting hints below). Them's
the breaks. If programming was easy, they wouldn't need something as
complicated as a human being to do it, now would they?
24.2.1. Time Efficiency
-
Use hashes instead of linear searches. For example, instead of searching
through @keywords to see if $_ is a keyword, construct a hash
with:
my %keywords;
for (@keywords) {
$keywords{$_}++;
}
Then you can quickly tell if $_ contains a keyword by testing
$keyword{$_} for a nonzero value.
-
Avoid subscripting when a foreach or list operator
will do. Not only is subscripting an extra operation, but if your
subscript variable happens to be in floating point because you did
arithmetic, an extra conversion from floating point back to integer is
necessary. There's often a better way to do it. Consider using
foreach, shift, and
splice operations. Consider saying use
integer.
-
Avoid goto. It scans outward from your current location for the
indicated label.
-
Avoid printf when print will do.
-
Avoid $& and its two buddies, $` and $'. Any occurrence in
your program causes all matches to save the searched string for
possible future reference. (However, once you've blown it, it doesn't
hurt to have more of them.)
-
Avoid using eval on a string. An
eval of a string (although not of a
BLOCK) forces recompilation every time
through. The Perl parser is pretty fast for a parser, but that's not
saying much. Nowadays there's almost always a better way to do what
you want anyway. In particular, any code that uses
eval merely to construct variable names is obsolete
since you can now do the same directly using symbolic references:
no strict 'refs';
$name = "variable";
$$name = 7; # Sets $variable to 7
-
Avoid evalSTRING inside
a loop. Put the loop into the eval instead, to
avoid redundant recompilations of the code. See the
study operator in Chapter 29, "Functions"
for an example of this.
-
Avoid run-time-compiled patterns. Use the
/pattern/o
(once only) pattern modifier to avoid pattern recompilation when the
pattern doesn't change over the life of the process. For patterns that
change occasionally, you can use the fact that a null pattern refers
back to the previous pattern, like this:
"foundstring" =~ /$currentpattern/; # Dummy match (must succeed).
while (<>) {
print if //;
}
Alternatively, you can precompile your regular expression using the qr
quote construct. You can also use eval to recompile a subroutine
that does the match (if you only recompile occasionally). That works even better if you compile a bunch of matches into a single subroutine, thus amortizing the subroutine call overhead.
-
Short-circuit alternation is often faster than the corresponding regex. So:
print if /one-hump/ || /two/;
is likely to be faster than:
print if /one-hump|two/;
at least for certain values of one-hump and two. This is because the
optimizer likes to hoist certain simple matching operations up into
higher parts of the syntax tree and do very fast matching with a
Boyer-Moore algorithm. A complicated pattern tends to defeat this.
-
Reject common cases early with next if. As with simple regular
expressions, the optimizer likes this. And it just makes sense to avoid
unnecessary work. You can typically discard comment lines and blank
lines even before you do a split or chop:
while (<>) {
next if /^#/;
next if /^$/;
chop;
@piggies = split(/,/);
...
}
-
Avoid regular expressions with many quantifiers or with big
{MIN,MAX} numbers on parenthesized expressions. Such patterns
can result in exponentially slow backtracking behavior unless the
quantified subpatterns match on their first "pass". You can also
use the (?>...) construct to force a subpattern to either
match completely or fail without backtracking.
-
Try to maximize the length of any nonoptional literal strings in
regular expressions. This is counterintuitive, but longer patterns
often match faster than shorter patterns. That's because the optimizer
looks for constant strings and hands them off to a Boyer-Moore search,
which benefits from longer strings. Compile your pattern with Perl's
-Dr debugging switch to see what Dr. Perl thinks the longest literal
string is.
-
Avoid expensive subroutine calls in tight loops. There is overhead
associated with calling subroutines, especially when you pass lengthy
parameter lists or return lengthy values. In order of increasing
desperation, try passing values by reference, passing values as
dynamically scoped globals, inlining the subroutine, or rewriting the
whole loop in C. (Better than all of those solutions is if you can define the
subroutine out of existence by using a smarter algorithm.)
-
Avoid getc for anything but single-character terminal I/O. In fact,
don't use it for that either. Use sysread.
-
Avoid frequent substrs on long strings, especially if the string
contains UTF-8. It's okay to use substr at the front of a string,
and for some tasks you can keep the substr at the front by "chewing up"
the string as you go with a four-argument substr, replacing the
part you grabbed with "":
while ($buffer) {
process(substr($buffer, 0, 10, ""));
}
-
Use pack and unpack instead of multiple substr invocations.
-
Use substr as an lvalue rather than concatenating substrings. For
example, to replace the fourth through seventh characters of $foo with
the contents of the variable $bar, don't do this:
$foo = substr($foo,0,3) . $bar . substr($foo,7);
Instead, simply identify the part of the string to be replaced and
assign into it, as in:
substr($foo, 3, 4) = $bar;
But be aware that if $foo is a huge string and $bar isn't
exactly the length of the "hole", this can do a lot of copying too. Perl tries to minimize that by copying from either the front or the
back, but there's only so much it can do if the substr is in the
middle.
-
Use s/// rather than concatenating substrings. This is especially
true if you can replace one constant with another of the same size. This results in an in-place substitution.
-
Use statement modifiers and equivalent and and or operators
instead of full-blown conditionals. Statement modifiers (like $ring
= 0 unless $engaged) and logical operators avoid the overhead of
entering and leaving a block. They can often be more readable too.
-
Use $foo = $a || $b || $c. This is much faster (and shorter to say)
than:
if ($a) {
$foo = $a;
}
elsif ($b) {
$foo = $b;
}
elsif ($c) {
$foo = $c;
}
Similarly, set default values with:
$pi ||= 3;
-
Group together any tests that want the same initial string. When testing
a string for various prefixes in anything resembling a switch structure,
put together all the /^a/ patterns, all the /^b/ patterns, and so
on.
-
Don't test things you know won't match. Use last or elsif to
avoid falling through to the next case in your switch statement.
-
Use special operators like study, logical string operations, pack
'u', and unpack '%' formats.
-
Beware of the tail wagging the dog. Misstatements resembling
(<STDIN>)[0] can cause Perl much unnecessary work. In accordance
with Unix philosophy, Perl gives you enough rope to hang yourself.
-
Factor operations out of loops. The Perl optimizer does not attempt to
remove invariant code from loops. It expects you to exercise some sense.
-
Strings can be faster than arrays.
-
Arrays can be faster than strings. It all depends on
whether you're going to reuse the strings or arrays and which
operations you're going to perform. Heavy modification of each element
implies that arrays will be better, and occasional modification of some
elements implies that strings will be better. But you just have to try
it and see.
-
my variables are faster than local variables.
-
Sorting on a manufactured key array may be faster than using a fancy
sort subroutine. A given array value will usually be compared multiple
times, so if the sort subroutine has to do much recalculation, it's
better to factor out that calculation to a separate pass before the
actual sort.
-
If you're deleting characters, tr/abc//d is faster than s/[abc]//g.
-
print with a comma separator may be faster than concatenating
strings. For example:
print $fullname{$name} . " has a new home directory " .
$home{$name} . "\n";
has to glue together the two hashes and the two fixed strings before
passing them to the low-level print routines, whereas:
print $fullname{$name}, " has a new home directory ",
$home{$name}, "\n";
doesn't. On the other hand, depending on the values and the
architecture, the concatenation may be faster. Try it.
-
Prefer join("", ...) to a series of concatenated strings. Multiple
concatenations may cause strings to be copied back and forth multiple
times. The join operator avoids this.
-
split on a fixed string is generally faster than split on a
pattern. That is, use split(/ /, ...) rather than split(/ +/, ...)
if you know there will only be one space. However, the patterns
/\s+/, /^/, and / / are specially optimized, as is the special split
on whitespace.
-
Pre-extending an array or string can save some time. As strings and
arrays grow, Perl extends them by allocating a new copy with some room
for growth and copying in the old value. Pre-extending a string with
the x operator or an array by setting
$#array can prevent this
occasional overhead and reduce memory fragmentation.
-
Don't undef long strings and arrays if they'll be reused for the same
purpose. This helps prevent reallocation when the string or array must
be re-extended.
-
Prefer "\0" x 8192 over unpack("x8192",()).
-
system("mkdir ...") may be faster on multiple directories if the
mkdir syscall isn't available.
-
Avoid using eof if return values will already indicate it.
-
Cache entries from files (like passwd and
group files) that are apt to be reused. It's
particularly important to cache entries from the network. For
example, to cache the return value from
gethostbyaddr when you are converting numeric
addresses (like 204.148.40.9) to names (like
"www.oreilly.com"), you can use something like:
sub numtoname {
local ($_) = @_;
unless (defined $numtoname{$_}) {
my (@a) = gethostbyaddr(pack('C4', split(/\./)),2);
$numtoname{$_} = @a > 0 ? $a[0] : $_;
}
return $numtoname{$_};
}
-
Avoid unnecessary syscalls. Operating system calls tend to be rather
expensive. So for example, don't call the time operator when a
cached value of $now would do. Use the special _ filehandle to
avoid unnecessary stat(2) calls. On some systems, even a minimal
syscall may execute a thousand instructions.
-
Avoid unnecessary system calls. The system function has to fork a
subprocess in order to execute the program you specify--or worse, execute a
shell to execute the program. This can easily execute a
million instructions.
-
Worry about starting subprocesses, but only if they're frequent. Starting a single pwd, hostname, or find process isn't going to
hurt you much--after all, a shell starts subprocesses all day long. We
do occasionally encourage the toolbox approach, believe it or not.
-
Keep track of your working directory yourself rather than calling pwd
repeatedly. (A standard module is provided for this. See
Cwd in Chapter 30, "The Standard Perl Library".)
-
Avoid shell metacharacters in commands--pass lists to system and
exec where appropriate.
-
Set the sticky bit on the Perl interpreter on machines without demand
paging:
chmod +t /usr/bin/perl
-
Allowing built-in functions' arguments to default to $_ doesn't make your
program faster.
24.2.2. Space Efficiency
-
You can use vec for compact integer array storage
if the integers are of fixed width. (Integers of variable width can
be stored in a UTF-8 string.)
-
Prefer numeric values over equivalent string values--they require less
memory.
-
Use substr to store constant-length strings in a longer string.
-
Use the Tie::SubstrHash module for very compact storage of a hash array,
if the key and value lengths are fixed.
-
Use __END__ and the DATA filehandle to avoid storing program data
as both a string and an array.
-
Prefer each to keys where order doesn't matter.
-
Delete or undef globals that are no longer in use.
-
Use some kind of DBM to store hashes.
-
Use temp files to store arrays.
-
Use pipes to offload processing to other tools.
-
Avoid list operations and entire file slurps.
-
Avoid using tr///. Each tr/// expression must store a
sizable translation table.
-
Don't unroll your loops or inline your subroutines.
24.2.3. Programmer Efficiency
-
Use defaults.
-
Use funky shortcut command-line switches like -a, -n, -p,
-s, and -i.
-
Use for to mean foreach.
-
Run system commands with backticks.
-
Use <*> and such.
-
Use patterns created at run time.
-
Use *, +, and {} liberally in your patterns.
-
Process whole arrays and slurp entire files.
-
Use getc.
-
Use $`, $&, and $'.
-
Don't check error values on open, since <HANDLE> and
printHANDLE will simply behave as no-ops when given an invalid handle.
-
Don't close your files--they'll be closed on the next open.
-
Don't pass subroutine arguments. Use globals.
-
Don't name your subroutine parameters. You can access them directly as
$_[EXPR].
-
Use whatever you think of first.
24.2.4. Maintainer Efficiency
-
Don't use defaults.
-
Use foreach to mean foreach.
-
Use meaningful loop labels with next and last.
-
Use meaningful variable names.
-
Use meaningful subroutine names.
-
Put the important thing first on the line using and, or, and
statement modifiers (like exit if $done).
-
Close your files as soon as you're done with them.
-
Use packages, modules, and classes to hide your implementation details.
-
Pass arguments as subroutine parameters.
-
Name your subroutine parameters using my.
-
Parenthesize for clarity.
-
Put in lots of (useful) comments.
-
Include embedded pod documentation.
-
use warnings.
-
use strict.
24.2.5. Porter Efficiency
-
Wave a handsome tip under his nose.
-
Avoid functions that aren't implemented everywhere. You can use eval
tests to see what's available.
-
Use the Config module or the $^O variable to find out what kind of
machine you're running on.
-
Don't expect native float and double to pack and unpack on foreign
machines.
-
Use network byte order (the "n" and "N" formats for pack) when
sending binary data over the network.
-
Don't send binary data over the network. Send ASCII. Better, send UTF-8.
Better yet, send money.
-
Check $] or $^V to see if the current version supports all the
features you use.
-
Don't use $] or $^V. Use require or use with a version number.
-
Put in the eval exec hack even if you don't use it, so your program
will run on those few systems that have Unix-like shells but don't
recognize the #! notation.
-
Put the #!/usr/bin/perl line in even if you don't use it.
-
Test for variants of Unix commands. Some find programs can't handle the -xdev switch,
for example.
-
Avoid variant Unix commands if you can do it internally. Unix commands
don't work too well on MS-DOS or VMS.
-
Put all your scripts and manpages into a single network filesystem that's
mounted on all your machines.
-
Publish your module on CPAN. You'll get lots of feedback if it's not
portable.
24.2.6. User Efficiency
-
Instead of making users enter data line by line, pop users into
their favorite editor.
-
Better yet, use a GUI like the Perl/Tk extension, where users can
control the order of events. (Perl/Tk is available on CPAN.)
-
Put up something for users to read while you continue doing work.
-
Use autoloading so that the program appears to run faster.
-
Give the option of helpful messages at every prompt.
-
Give a helpful usage message if users don't give correct input.
-
Display the default action at every prompt, and maybe a few
alternatives.
-
Choose defaults for beginners. Allow experts to change the defaults.
-
Use single character input where it makes sense.
-
Pattern the interaction after other things the user is familiar with.
-
Make error messages clear about what needs fixing. Include all
pertinent information such as filename and error code, like this:
open(FILE, $file) or die "$0: Can't open $file for reading: $!\n";
-
Use fork && exit to detach from the terminal when the rest of the script is just batch
processing.
-
Allow arguments to come from either the command line or standard
input.
-
Don't put arbitrary limitations into your program.
-
Prefer variable-length fields over fixed-length fields.
-
Use text-oriented network protocols.
-
Tell everyone else to use text-oriented network protocols!
-
Tell everyone else to tell everyone else to use text-oriented network protocols!!!
-
Be vicariously lazy.
-
Be nice.
| | |
24.1. Common Goofs for Novices | | 24.3. Programming with Style |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|