24.4. Fluent Perl
We've touched on a few idioms in the preceding sections (not to mention
the preceding chapters), but there are many other idioms you'll
commonly see if you read programs by accomplished Perl programmers.
When we speak of idiomatic Perl in this context, we don't just mean a
set of arbitrary Perl expressions with fossilized meanings.
Rather, we mean Perl code that shows an understanding of the flow of
the language, what you can get away with when, and what that buys
you. And when to buy it.
We can't hope to list all the idioms you might see--that would take a
book as big as this one. Maybe two. (See the Perl
Cookbook, for instance.) But here are some of the
important idioms, where "important" might be defined as "that which
induces hissy fits in people who think they already know just how
computer languages ought to work".
-
Use => in place of a comma anywhere you think it improves readability:
return bless $mess => $class;
This reads, "Bless this mess into the specified class." Just be careful
not to use it after a word that you don't want autoquoted:
sub foo () { "FOO" }
sub bar () { "BAR" }
print foo => bar; # prints fooBAR, not FOOBAR;
Another good place to use => is near a literal
comma that might get confused visually:
join(", " => @array);
Perl provides you with more than one way to do things so that you can
exercise your ability to be creative. Exercise it!
-
Use the singular pronoun to increase readability:
for (@lines) {
$_ .= "\n";
}
The $_ variable is Perl's version of a pronoun, and it essentially
means "it". So the code above means "for each line, append a newline to
it." Nowadays you might even spell that:
$_ .= "\n" for @lines;
The $_ pronoun is so important to Perl that its use
is mandatory in grep and map.
Here is one way to set up a cache of common results of an expensive
function:
%cache = map { $_ => expensive($_) } @common_args;
$xval = $cache{$x} || expensive($x);
-
Omit the pronoun to increase readability even further.[1]
-
Use loop controls with statement modifiers.
while (<>) {
next if /^=for\s+(index|later)/;
$chars += length;
$words += split;
$lines += y/\n//;
}
This is a fragment of code we used to do page counts for this book. When
you're going to be doing a lot of work with the same variable, it's
often more readable to leave out the pronouns entirely, contrary to
common belief.
The fragment also demonstrates the idiomatic use of next
with a statement modifier to short-circuit a loop.
The $_ variable is always the loop control variable
in grep and map, but the
program's reference to it is often implicit:
@haslen = grep { length } @random;
Here we take a list of random scalars and only pick the ones that have
a length greater than 0.
-
Use for to set the antecedent for a pronoun:
for ($episode) {
s/fred/barney/g;
s/wilma/betty/g;
s/pebbles/bambam/g;
}
So what if there's only one element in the loop? It's a convenient
way to set up "it", that is, $_. Linguistically, this is known
as topicalization. It's not cheating, it's communicating.
-
Implicitly reference the plural pronoun, @_.
-
Use control flow operators to set defaults:
sub bark {
my Dog $spot = shift;
my $quality = shift || "yapping";
my $quantity = shift || "nonstop";
...
}
Here we're implicitly using the other Perl pronoun,
@_, which means "them". The arguments to a
function always come in as "them". The shift
operator knows to operate on @_ if you omit it,
just as the ride operator at Disneyland might call out "Next!" without
specifying which queue is supposed to shift. (There's no point in
specifying, because there's only one queue that matters.)
The || can be used to set defaults despite its
origins as a Boolean operator, since Perl returns the first true
value. Perl programmers often manifest a cavalier attitude toward the
truth; the line above would break if, for instance, you tried to
specify a quantity of 0. But as long as you never want to set either
$quality or $quantity to a false
value, the idiom works great. There's no point in getting all
superstitious and throwing in calls to defined and
exists all over the place. You just have to
understand what it's doing. As long as it won't accidentally be
false, you're fine.
-
Use
assignment forms of operators, including control flow operators:
$xval = $cache{$x} ||= expensive($x);
Here we don't initialize our cache at all. We just rely on the
||= operator to call
expensive($x) and assign it to
$cache{$x} only if $cache{$x} is
false. The result of that is whatever the new value of
$cache{$x} is. Again, we take the cavalier
approach towards truth, in that if we cache a false value,
expensive($x) will get called again. Maybe the
programmer knows that's okay, because expensive($x)
isn't expensive when it returns false. Or maybe the programmer knows
that expensive($x) never returns a false value at
all. Or maybe the programmer is just being sloppy. Sloppiness can be
construed as a form of creativity.
-
Use loop controls as operators, not just as
statements. And...
-
Use commas
like small semicolons:
while (<>) {
$comments++, next if /^#/;
$blank++, next if /^\s*$/;
last if /^__END__/;
$code++;
}
print "comment = $comments\nblank = $blank\ncode = $code\n";
This shows an understanding that statement modifiers
modify statements, while next is a mere operator. It also shows
the comma being idiomatically used to separate expressions much like
you'd ordinarily use a semicolon. (The difference being that the
comma keeps the two expressions as part of the same statement, under the
control of the single statement modifier.)
-
Use flow control to your advantage:
while (<>) {
/^#/ and $comments++, next;
/^\s*$/ and $blank++, next;
/^__END__/ and last;
$code++;
}
print "comment = $comments\nblank = $blank\ncode = $code\n";
Here's the exact same loop again, only this time with the patterns out in front. The
perspicacious Perl programmer understands that it compiles down to exactly the
same internal codes as the previous example. The if modifier is
just a backward and (or &&) conjunction, and the unless
modifier is just a backward or (or ||) conjunction.
-
Use the implicit loops provided by the -n and -p switches.
-
Don't put semicolon at the end of a one-line block:
#!/usr/bin/perl -n
$comments++, next LINE if /#/;
$blank++, next LINE if /^\s*$/;
last LINE if /^__END__/;
$code++;
END { print "comment = $comments\nblank = $blank\ncode = $code\n" }
This is essentially the same program as before. We put an explicit
LINE label on the loop control operators because we felt like it, but
we didn't really need to, since the implicit LINE loop supplied by -n is the innermost
enclosing loop. We used an END to get the final print statement
outside the implicit main loop, just as in awk.
-
Use here docs when the printing gets ferocious.
-
Use a meaningful delimiter on the here doc:
END { print <<"COUNTS" }
comment = $comments
blank = $blank
code = $code
COUNTS
Rather than using multiple prints, the fluent Perl programmer uses a
multiline string with interpolation. And despite our calling it a
Common Goof earlier, we've brazenly left off the trailing
semicolon because it's not necessary at the end of the END block. (If we
ever turn it into a multiline block, we'll put the semicolon back in.)
-
Do substitutions and translations en passant on a scalar:
($new = $old) =~ s/bad/good/g;
Since lvalues are lvaluable, so to speak, you'll often see people
changing a value "in passing" while it's being assigned. This could
actually save a string copy internally (if we ever get around to
implementing the optimization):
chomp($answer = <STDIN>);
Any function that modifies an argument in place can do the en passant
trick. But wait, there's more!
-
Don't limit yourself to changing scalars en passant:
for (@new = @old) { s/bad/good/g }
Here we copy @old into @new, changing everything in passing
(not all at once, of course--the block is executed repeatedly, one "it" at a time).
-
Pass named parameters using the fancy => comma operator.
-
Rely on assignment to a hash to do even/odd argument processing:
sub bark {
my DOG $spot = shift;
my %parm = @_;
my $quality = $parm{QUALITY} || "yapping";
my $quantity = $parm{QUANTITY} || "nonstop";
...
}
$fido->bark( QUANTITY => "once",
QUALITY => "woof" );
Named parameters are often an affordable luxury. And with Perl, you
get them for free, if you don't count the cost of the hash assignment.
-
Repeat Boolean expressions until false.
-
Use minimal matching when appropriate.
-
Use the /e modifier to evaluate a replacement expression:
#!/usr/bin/perl -p
1 while s/^(.*?)(\t+)/$1 . ' ' x (length($2) * 4 - length($1) % 4)/e;
This program fixes any file you receive from someone who mistakenly
thinks they can redefine hardware tabs to occupy 4 spaces instead
of 8. It makes use of several important idioms. First, the 1 while idiom
is handy when all the work you want to do in the loop is actually done
by the conditional. (Perl is smart enough not to warn you that you're
using 1 in a void context.) We have to repeat this substitution because
each time we substitute some number of spaces in for tabs, we have to
recalculate the column position of the next tab from the beginning.
The (.*?) matches the smallest string it can up until the first tab,
using the minimal matching modifier (the question mark). In this case,
we could have used an ordinary greedy * like this: ([^\t]*). But
that only works because a tab is a single character, so we can use a
negated character class to avoid running past the first tab. In general,
the minimal matcher is much more elegant, and doesn't break if the next
thing that must match happens to be longer than one character.
The /e modifier does a substitution using an expression rather than
a mere string. This lets us do the calculations we need right when
we need them.
-
Use creative formatting and comments on complex substitutions:
#!/usr/bin/perl -p
1 while s{
^ # anchor to beginning
( # start first subgroup
.*? # match minimal number of characters
) # end first subgroup
( # start second subgroup
\t+ # match one or more tabs
) # end second subgroup
}
{
my $spacelen = length($2) * 4; # account for full tabs
$spacelen -= length($1) % 4; # account for the uneven tab
$1 . ' ' x $spacelen; # make correct number of spaces
}ex;
This is probably overkill, but some people find it more impressive
than the previous one-liner. Go figure.
-
Go ahead and use $` if you feel like it:
1 while s/(\t+)/' ' x (length($1) * 4 - length($`) % 4)/e;
Here's the shorter version, which uses $`, which is
known to impact performance. Except that we're only using the length
of it, so it doesn't really count as bad.
-
Use the offsets directly from the @-
(@LAST_MATCH_START) and @+
(@LAST_MATCH_END) arrays:
1 while s/\t+/' ' x (($+[0] - $-[0]) * 4 - $-[0] % 4)/e;
This one's even shorter. (If you don't see any arrays there, try looking for array elements instead.) See @- and @+ in Chapter 28, "Special Names".
-
Use eval with a constant return value:
sub is_valid_pattern {
my $pat = shift;
return eval { "" =~ /$pat/; 1 } || 0;
}
You don't have to use the eval {} operator to return a real value. Here we always return 1 if it gets to the end. However, if the pattern
contained in $pat blows up, the eval catches it and returns undef
to the Boolean conditional of the || operator, which turns it into
a defined 0 (just to be polite, since undef is also false but might
lead someone to believe that the is_valid_pattern subroutine is
misbehaving, and we wouldn't want that, now would we?).
-
Use modules to do all the dirty work.
-
Use object factories.
-
Use callbacks.
-
Use stacks to keep track of context.
-
Use negative subscripts to access the end of an array or string:
use XML::Parser;
$p = new XML::Parser Style => 'subs';
setHandlers $p Char => sub { $out[-1] .= $_[1] };
push @out, "";
sub literal {
$out[-1] .= "C<";
push @out, "";
}
sub literal_ {
my $text = pop @out;
$out[-1] .= $text . ">";
}
...
This is a snippet from the 250-line program we used to translate the
XML version of the old Camel book back into pod format so we could edit
it for this edition with a Real Text Editor.
The first thing you'll notice is that we rely on the XML::Parser
module (from CPAN) to parse our XML correctly, so we don't have to
figure out how. That cuts a few thousand lines out of our program
right there (presuming we're reimplementing in Perl everything
XML::Parser does for us,[2]
including translation from almost any character set into UTF-8).
XML::Parser uses a high-level idiom called an object factory. In
this case, it's a parser factory. When we create an XML::Parser
object, we tell it which style of parser interface we want, and it
creates one for us. This is an excellent way to build a testbed
application when you're not sure which kind of interface will turn out
to be the best in the long run. The subs style is just one of
XML::Parser's interfaces. In fact, it's one of the oldest
interfaces, and probably not even the most popular one these days.
The setHandlers line shows a method call on the parser, not in arrow
notation, but in "indirect object" notation, which lets you omit the
parens on the arguments, among other things. The line also uses the
named parameter idiom we saw earlier.
The line also shows another powerful concept, the notion of a
callback. Instead of us calling the parser to get the next item, we
tell it to call us. For named XML tags like <literal>, this
interface style will automatically call a subroutine of that name (or the name
with an underline on the end for the corresponding end tag). But the
data between tags doesn't have a name, so we set up a Char callback
with the setHandlers method.
Next we initialize the @out array, which is a stack of outputs. We
put a null string into it to represent that we haven't collected any
text at the current tag embedding level (0 initially).
Now is when that callback comes back in. Whenever we see text, it
automatically gets appended to the final element of the array, via the
$out[-1] idiom in the callback. At the outer tag level, $out[-1]
is the same as $out[0], so $out[0] ends up with our whole
output. (Eventually. But first we have to deal with tags.)
Suppose we see a <literal> tag. Then the literal subroutine
gets called, appends some text to the current output, then pushes a new
context onto the @out stack. Now any text up until the closing tag
gets appended to that new end of the stack. When we hit the closing
tag, we pop the $text we've collected back off the @out stack,
and append the rest of the transmogrified data to the new (that is, the
old) end of stack, the result of which is to translate the XML string, <literal>text</literal>, into the corresponding pod string, C<text>.
The subroutines for the other tags are just the same, only different.
-
Use my without assignment to create an empty array or hash.
-
Split the default string on whitespace.
-
Assign to lists of variables to collect however many you want.
-
Use autovivification of undefined references to create them.
-
Autoincrement undefined array and hash elements to create them.
-
Use autoincrement of a %seen array to determine uniqueness.
-
Assign to a handy my temporary in the conditional.
-
Use the autoquoting behavior of braces.
-
Use an alternate quoting mechanism to interpolate double quotes.
-
Use the ?: operator to switch between two arguments to a printf.
-
Line up printf args with their % field:
my %seen;
while (<>) {
my ($a, $b, $c, $d) = split;
print unless $seen{$a}{$b}{$c}{$d}++;
}
if (my $tmp = $seen{fee}{fie}{foe}{foo}) {
printf qq(Saw "fee fie foe foo" [sic] %d time%s.\n"),
$tmp, $tmp == 1 ? "" : "s";
}
These nine lines are just chock full of idioms. The first line makes
an empty hash because we don't assign anything to it. We iterate over
input lines setting "it", that is, $_, implicitly,
then using an argumentless split which splits "it"
on whitespace. Then we pick off the four first words with a list
assignment, throwing any subsequent words away. Then we remember the
first four words in a four-dimensional hash, which automatically
creates (if necessary) the first three reference elements and final
count element for the autoincrement to increment. (Under use
warnings, the autoincrement will never warn that you're
using undefined values, because autoincrement is an accepted way to
define undefined values.) We then print out the line if we've never
seen a line starting with these four words before, because the
autoincrement is a postincrement, which, in addition to incrementing
the hash value, will return the old true value if there was one.
After the loop, we test %seen again to see if a
particular combination of four words was seen. We make use of the
fact that we can put a literal identifier into braces and it will be
autoquoted. Otherwise, we'd have to say
$seen{"fee"}{"fie"}{"foe"}{"foo"}, which is a drag
even when you're not running from a giant.
We assign the result of $seen{fee}{fie}{foe}{foo}
to a temporary variable even before testing it in the Boolean context
provided by the if. Because assignment returns its
left value, we can still test the value to see if it was true. The
my tells your eye that it's a new variable, and
we're not testing for equality but doing an assignment. It would also
work fine without the my, and an expert Perl
programmer would still immediately notice that we used one
= instead of two ==. (A
semiskilled Perl programmer might be fooled, however. Pascal
programmers of any skill level will foam at the mouth.)
Moving on to the printf statement, you can see the
qq() form of double quotes we used so that we could
interpolate ordinary double quotes as well as a newline. We could've
directly interpolated $tmp there as well, since
it's effectively a double-quoted string, but we chose to do further
interpolation via printf. Our temporary
$tmp variable is now quite handy, particularly
since we don't just want to interpolate it, but also test it in the
conditional of a ?: operator to see whether we
should pluralize the word "time". Finally, note that we lined up the
two fields with their corresponding % markers in
the printf format. If an argument is too long to
fit, you can always go to the next line for the next argument, though
we didn't have to in this case.
Whew! Had enough? There are many more idioms we could discuss, but
this book is already sufficiently heavy. But we'd like to
talk about one more idiomatic use of Perl, the writing of program
generators.
| | |
24.3. Programming with Style | | 24.5. Program Generation |
Copyright © 2001 O'Reilly & Associates. All rights reserved.
|