Interpolation, Concatenation, or List (Practical mod

13.4. Interpolation, Concatenation, or List

Let's revisit the various approaches of munging with strings, and compare the speed of using lists of strings versus interpolation. We will add a string concatenation angle as well.

When the strings are small, it almost doesn't matter whether interpolation or a list is used (see Example 13-8).

Example 13-8. benchmarks/join.pl

use Benchmark;
use Symbol;
my $fh = gensym;
open $fh, ">/dev/null" or die;

my($one, $two, $three, $four) = ('a'..'d');

timethese(1_000_000, {
     interp => sub {
         print $fh "$one$two$three$four";
     },
     list => sub {
         print $fh $one, $two, $three, $four;
     },
     conc => sub {
         print $fh $one . $two . $three . $four;
     },
});

Here's the benchmarking result:

 Benchmark: timing 1000000 iterations of conc, interp, list...
    conc:  3 wallclock secs ( 3.38 usr +  0.00 sys =  3.38 CPU)
  interp:  3 wallclock secs ( 3.45 usr + -0.01 sys =  3.44 CPU)
    list:  2 wallclock secs ( 2.58 usr +  0.00 sys =  2.58 CPU)

The results of the concatenation technique are very similar to those of interpolation. The list technique is a little bit faster than interpolation. However, when the strings are large, lists are significantly faster. We saw this in the previous section, and Example 13-9 presents another benchmark to increase our confidence in our conclusion. This time we use 1,000-character strings.

Example 13-9. benchmarks/join_long.pl

use Benchmark;
use Symbol;
my $fh = gensym;
open $fh, ">/dev/null" or die;

my($one, $two, $three, $four) = map { $_ x 1000 } ('a'..'d');

timethese(500_000, {
     interp => sub {
         print $fh "$one$two$three$four";
     },
     list => sub {
         print $fh $one, $two, $three, $four;
     },
     conc => sub {
         print $fh $one . $two . $three . $four;
     },
});

Here's the benchmarking result:

Benchmark: timing 500000 iterations of interp, list...
   conc:  5 wallclock secs ( 4.47 usr +  0.27 sys =  4.74 CPU)
 interp:  4 wallclock secs ( 4.25 usr +  0.26 sys =  4.51 CPU)
   list:  4 wallclock secs ( 2.87 usr +  0.16 sys =  3.03 CPU)

In this case using a list is about 30% faster than interpolation. Concatenation is a little bit slower than interpolation.

Let's look at this code:

$title = 'My Web Page';
print "<h1>$title</h1>";         # Interpolation (slow)
print '<h1>' . $title . '</h1>'; # Concatenation (slow)
print '<h1>',  $title,  '</h1>'; # List (fast for long strings)

When you use "<h1>$title</h1>", Perl does interpolation (since "" is an operator in Perl)—it parses the contents of the string and replaces any variables or expressions it finds with their respective values. This uses more memory and is slower than using a list. Of course, if there are no variables to interpolate it makes no difference whether you use "string" or 'string'.

Concatenation is also potentially slow, since Perl might create a temporary string, which it then prints.

Lists are fast because Perl can simply deal with each element in turn. This is true if you don't run join( ) on the list at the end to create a single string from the elements of the list. This operation might be slower than directly appending to the string whenever a new string springs into existence.

Please note that this optimization is a pure waste of time, except maybe in a few extreme cases (if you have even 5,000 concatenations to serve a request, it won't cost you more than a few milliseconds to do it the wrong way). It's a good idea to always look at the big picture when running benchmarks.

Another aspect to look at is the size of the generated code. For example, lines 3, 4, and 5 in Example 13-10 produce the same output.

Example 13-10. size_interp.pl

$uri = '/test';
$filename = '/test.pl';
print "uri => ",  $uri,  " filename => ",  $filename,  "\n";
print "uri => " . $uri . " filename => " . $filename . "\n";
print "uri => $uri filename => $filename\n";
1; # needed for TerseSize to report the previous line's size

Let's look at how many bytes each line compiles into. We will use B::TerseSize for this purpose:

panic% perl -MO=TerseSize size_interp.pl | grep line
size_interp.pl syntax OK
[line 1 size: 238 bytes]
[line 2 size: 241 bytes]
[line 3 size: 508 bytes]
[line 4 size: 636 bytes]
[line 5 size: 689 bytes]

The code in line 3, which uses a list of arguments to print( ), uses significantly less memory (508 bytes) than the code in line 4, which uses concatenation (636 bytes), and the code in line 5, which uses interpolation (689 bytes).

If there are no variables to interpolate, it's obvious that a list will use more memory then a single string. Just to confirm that, take a look at Example 13-11.

Example 13-11. size_nointerp.pl

print "uri => ",  "uri",  " filename => ",  "filename",  "\n";
print "uri => " . "uri" . " filename => " . "filename" . "\n";
print "uri => uri filename => filename\n";
1; # needed for TerseSize to report the previous line's size

panic% perl -MO=TerseSize size_nointerp.pl | grep line
size_nointerp.pl syntax OK
[line 1 size: 377 bytes]
[line 2 size: 165 bytes]
[line 3 size: 165 bytes]

Lines 2 and 3 get compiled to the same code, and its size is smaller than the code produced by line 1, which uses a list.