System Performance and Profiling (Unix Power Tools, 3rd Edition)

26.1. Timing Is Everything

Whether you are a system administrator or user, the responsiveness of your Unix system is going to be the primary criterion of evaluating your machine. Of course, "responsiveness" is a loaded word. What about your system is responsive? Responsive to whom? How fast does the system need to be to be responsive? There is no one silver bullet that will slay all system latencies, but there are tools that isolate performance bottlenecks -- the most important of which you carry on your shoulders.

This chapter deals with issues that affect system performance generally and how you go about finding and attenuating system bottlenecks. Of course, this chapter cannot be a comprehensive guide to how to maximize your system for your needs, since that is far too dependent on the flavors of Unix and the machines on which they run. However, there are principles and programs that are widely available that will help you assess how much more performance you can expect from your hardware.

One of the fundamental illusions in a multiuser, multiprocessing operating system like Unix is that every user and every process is made to think that they are alone on the machine. This is by design. At the kernel level, a program called the scheduler attempts to juggle the needs of each user, providing overall decent performance of:

Keeping interactive sessions responsive

Processing batch jobs promptly

Maximizing CPU utilization [81]

[81]This list is modified from Tanenbaum and Woodhull's Operating Systems: Design and Implementation, Second Edition (Upper Saddle River: Prentice-Hall, Inc. 1997], 83).
Cranking through as many processes per hour as possible
Preventing any particular process for dominating CPU time

System performance degrades when one of these goals overwhelms the others. These problems are very intuitive: if there are five times the normal number of users logged into your system, chances are that your session will be less responsive than at less busy times.

Performance tuning is a multifaceted problem. At its most basic, performance issues can be looked at as being either global or local problems. Global problems affect the system as a whole and can generally be fixed only by the system administrator. These problems include insufficient RAM or hard drive space, inadequately powerful CPUs, and scanty network bandwidth. The global problems are really the result of a host of local issues, which all involve how each process on the system consumes resources. Often, it is up to the users to fix the bottlenecks in their own processes.

Global problems are diagnosed with tools that report system-wide statistics. For instance, when a system appears sluggish, most administrators run uptime (Section 26.4) to see how many processes were recently trying to run. If these numbers are significantly higher than normal usage, something is amiss (perhaps your web server has been slashdotted).

If uptime suggests increased activity, the next tool to use is either ps or top to see if you can find the set of processes causing the trouble. Because it shows you "live" numbers, top can be particularly useful in this situation. I also recommend checking the amount of available free disk space with df, since a full filesystem is often an unhappy one, and its misery spreads quickly.

Once particular processes have been isolated as being problematic, it's time to think locally. Process performance suffers when either there isn't more CPU time available to finish a task (this is known as a CPU-bound process) or the process is waiting for some I/O resource (i.e., I/O-bound ), such as the hard drive or network. One strategy for dealing with CPU-bound processes, if you have the source code for them, is to use a profiler like GNU's gprof. Profilers give an accounting for how much CPU time is spent in each subroutine of a given program. For instance, if I want to profile one of my programs, I'd first compile it with gcc and use the -pg compilation flag. Then I'd run the program. This creates the gmon.out data file that gprof can read. Now I can use gprof to give me a report with the following invocation:

$ gprof -b executable gmon.out

Here's an abbreviated version of the output:

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
  0.00      0.00     0.00        2     0.00     0.00  die_if_fault_occurred
  0.00      0.00     0.00        1     0.00     0.00  get_double
  0.00      0.00     0.00        1     0.00     0.00  print_values

Here, we see that three subroutines defined in this program (die_if_fault_occurred, get_double, and print_values) were called. In fact, the first subroutine was called twice. Because this program is neither processor- nor I/O-intensive, no significant time is shown to indicate how long each subroutine took to run. If one subroutine took a significantly longer time to run than the others, or one subroutine is called significantly more often than the others, you might want to see how you can make that problem subroutine faster. This is just the tip of the profiling iceberg. Consult your language's profiler documentation for more details.

One less detailed way to look at processes is to get an accounting of how much time a program took to run in user space, in kernel space, and in real time. For this, the time (Section 26.2) command exists as part of both C and bash shells. As an external program, /bin/time gives a slightly less detailed report. No special compilation is necessary to use this program, so it's a good tool to use to get a first approximation of the bottlenecks in a particular process.

Resolving I/O-bound issues is difficult for users. Only adminstrators can both tweak the low-level system settings that control system I/O buffering and install new hardware, if needed. CPU-bound processes might be improved by dividing the program into smaller programs that feed data to each other. Ideally, these smaller programs can be spread across several machines. This is the basis of distributed computing.

Sometimes, you want a particular process to hog all the system resources. This is the definition of a dedicated server, like one that hosts the Apache web server or an Oracle database. Often, server software will have configuration switches that help the administrator allocate system resources based on typical usage. This, of course, is far beyond the scope of this book, but do check out Web Performance Tuning and Oracle Performance Tuning from O'Reilly for more details. For more system-wide tips, pick up System Performance Tuning, also from O'Reilly.

As with so many things in life, you can improve performance only so much. In fact, by improving performance in one area, you're likely to see performance degrade in other tasks. Unless you've got a machine that's dedicated to a very specific task, beware the temptation to over-optimize.

-- JJ

Chapter 26. System Performance and Profiling

Contents:

26.1. Timing Is Everything