The Perl Profiler (Programming Perl)

20.6. The Perl Profiler

Do you want to make your program faster? Well, of course you do. But first you should stop and ask yourself, "Do I really need to spend time making this program faster?" Recreational optimization can be fun,[2] but normally there are better uses for your time. Sometimes you just need to plan ahead and start the program when you're going on a coffee break. (Or use it as an excuse for one.) But if your program absolutely must run faster, you should begin by profiling it. A profiler can tell you which parts of your program take the most time to execute, so you won't waste time optimizing a subroutine that has an insignificant effect on the overall execution time.

[2] Or so says Nathan Torkington, who contributed this section of the book.

Perl comes with a profiler, the Devel::DProf module. You can use it to profile the Perl program in mycode.pl by typing:

perl -d:DProf mycode.pl

Even though we've called it a profiler--since that's what it does--the mechanism DProf employs is the very same one we discussed earlier in this chapter. DProf is just a debugger that records the time Perl entered and left each subroutine.

When your profiled script terminates, DProf will dump the timing information to a file called tmon.out. The dprofpp program that came with Perl knows how to analyze tmon.out and produce a report. You may also use dprofpp as a frontend for the whole process with the -p switch (see described later).

Given this program:

outer();

sub outer {
  for (my $i=0; $i < 100; $i++) { inner() }
}

sub inner {
  my $total = 0;
  for (my $i=0; $i < 1000; $i++) { $total += $i }
}

inner();

the output of dprofpp is:

Total Elapsed Time = 0.537654 Seconds
  User+System Time = 0.317552 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 85.0   0.270  0.269    101   0.0027 0.0027  main::inner
 2.83   0.009  0.279      1   0.0094 0.2788  main::outer

Note that the percentage numbers don't add up to 100. In fact, in this case, they're pretty far off, which should tip you off that you need to run the program longer. As a general rule, the more profiling data you can collect, the better your statistical sample. If we increase the outer loop to run 1000 times instead of 100 times, we'll get more accurate results:

Total Elapsed Time = 2.875946 Seconds
  User+System Time = 2.855946 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 99.3   2.838  2.834   1001   0.0028 0.0028  main::inner
 0.14   0.004  2.828      1   0.0040 2.8280  main::outer

The first line reports how long the program took to run, from start to finish. The second line displays the total of two different numbers: the time spent executing your code ("user") and the time spent in the operating system executing system calls made by your code ("system"). (We'll have to forgive a bit of false precision in these numbers--the computer's clock almost certainly does not tick every millionth of a second. It might tick every hundredth of a second if you're lucky.)

The "user+system" times can be changed with command-line options to dprofpp. -r displays elapsed time, -s displays system time only, and -u displays user time only.

The rest of the report is a breakdown of the time spent in each subroutine. The "Exclusive Times" line indicates that when subroutine outer called subroutine inner, the time spent in inner didn't count towards outer's time. To change this, causing inner's time to be counted towards outer's, give the -I option to dprofpp.

For each subroutine, the following is reported: %Time, the percentage of time spent in this subroutine call; ExclSec, the time in seconds spent in this subroutine not including those subroutines called from it; CumulS, the time in seconds spent in this subroutine and those called from it; #Calls, the number of calls to the subroutine; sec/call, the average time in seconds of each call to the subroutine not including those called from it; Csec/c, the average time in seconds of each call to the subroutine and those called from it.

Of those, the most useful figure is %Time, which will tell you where your time goes. In our case, the inner subroutine takes the most time, so we should try to optimize that subroutine, or find an algorithm that will call it less. :-)

Options to dprofpp provide access to other information or vary the way the times are calculated. You can also make dprofpp run the script for you in the first place, so you don't have to remember the -d:DProf switch:

-p SCRIPT

Tells dprofpp that it should profile the given SCRIPT and then interpret its profile data. See also -Q.

-Q

Used with -p to tell dprofpp to quit after profiling the script, without interpreting the data.

-a

Sort output alphabetically by subroutine name rather than by decreasing percentage of time.

-R

Count anonymous subroutines defined in the same package separately. The default behavior is to count all anonymous subroutines as one, named main::__ANON__.

-I

Display all subroutine times inclusive of child subroutine times.

-l

Sort by number of calls to the subroutines. This may help identify candidates for inlining.

-O COUNT

Show only the top COUNT subroutines. The default is 15.

-q

Do not display column headers.

-T

Display the subroutine call tree to standard output. Subroutine statistics are not displayed.

-t

Display the subroutine call tree to standard output. Subroutine statistics are not displayed. A function called multiple (consecutive) times at the same calling level is displayed once, with a repeat count.

-S

Produce output structured by the way your subroutines call one another:

main::inner x 1         0.008s
main::outer x 1         0.467s = (0.000 + 0.468)s
  main::inner x 100     0.468s

Read this as follows: the top level of your program called inner once, and it ran for 0.008s elapsed time, and the top level called outer once and it ran for 0.467s inclusively (0s in outer itself, 0.468s in the subroutines called from outer) calling inner 100 times (which ran for 0.468s). Whew, got that?

Branches at the same level (for example, inner called once and outer called once) are sorted by inclusive time.

-U

Do not sort. Display in the order found in the raw profile.

-v

Sort by average time spent in subroutines during each call. This may help identify candidates for hand optimization by inlining subroutine bodies.

-g subroutine

Ignore subroutines except subroutine and whatever is called from it.

Other options are described in dprofpp(1), its standard manpage.

DProf is not your only choice for profiler. CPAN also holds Devel::SmallProf, which reports the time spent in each line of your program. That can help you figure out if you're using some particular Perl construct that is being surprisingly expensive. Most of the built-in functions are pretty efficient, but it's easy to accidentally write a regular expression whose overhead increases exponentially with the size of the input. See also the section Section 20.2, "Efficiency" in Chapter 24, "Common Practices", for other helpful hints.

Now go take that coffee break. You'll need it for the next chapter.