Administration (Webmaster in a Nutshell, 3rd Edition)

20.4. Administration

System administrators have the opportunity to improve performance in ways that users, HTML authors, and programmers do not, because administrators deal directly with web and application servers.

Apache, by far the most successful web server, is used on more than half of all web sites. Apache has real-time performance monitoring tools and an optional log format that tells you how long each transfer took (see mod_log_config.html in the server documentation). Try to ensure that the server has been compiled with the latest C compiler and libraries for your server platform, or compile the server yourself.

See Dean Gaudet's notes on tuning Apache servers at http://www.apache.org/docs/misc/perf-tuning.html.

20.4.1. AllowOverride

In the kind of full-path authentication used on Apache and some other servers, the current directory and each parent directory (up to the system root, not just up to the document root) are searched by default for a .htaccess authentication file to read and parse. You can speed up Apache by disabling this feature, turning off authentication for directories that don't need it (like the system root) by putting the following in the access.conf file:

<Directory />
     AllowOverride None
</Directory>

<Directory /usr/local/mydocroot>
     AllowOverride All (or any of the other AllowOverride options)     
</Directory>

Even better, if you don't use .htaccess files at all, disable them completely:

<Directory /usr/local/mydocroot>
         AllowOverride None     
</Directory>

The general web performance tip of keeping paths short takes on added importance for web servers that use directory-specific access control like Apache. Each directory traversal takes time not only because it follows the filesystem's linked list and checks Unix permissions, but also because of the .htaccess files, which are even less efficient.

20.4.2. Buffered Logs

For better performance, compile Apache with the -DBUFFERED_LOGS option so that log file writes are deferred until a certain number of bytes are accumulated. That number is the POSIX constant PIPE_BUF.

20.4.3. Max-Clients

You'll get much better performance by running only the number of server processes your RAM can hold. If you run too many, you'll start swapping and performance for each will drop. Knowing how many to run is tricky because some parts of each httpd process are shared with others, but a good rule of thumb for Apache is 1 MB per process. That is, if you have only 128 MB RAM, then don't try to run any more than 128 processes, even if you often have more than 128 concurrent users. Anyhow, you'll probably be limited by other factors at 128 processes.

You can configure the number of httpd processes in Apache with the Max-Clients directive. You don't want to run too few httpd processes either, because you need enough processes so that fast clients never have to wait for a slow client to finish and free up a process.

20.4.4. Persistent Connections

In older versions of Apache, the connection between client and server was closed after each transaction and needed to be reopened for every subsequent request. This made sense when you could assume that one page corresponded to one resource on the server, but became a huge performance liability as pages were increasingly crowded by images and multiple frames. The modern HTTP standard now supports persistent connections, also known as keepalives, whereby a client must explicitly close a connection.

To take full advantage of persistent connections, make sure the KeepAlive directive is set to On in httpd.conf. (KeepAlive is On by default with Apache 1.1 and higher.) Set the number of allowed requests per connection to a largish number (MaxKeepAliveRequests 100) to save the overhead of setting up new connections. Set the timeout fairly low, say 15 seconds (KeepAliveTimeout 15), so that clients that disconnect from the server without closing the connection properly will be timed out quickly. A keepalive timeout of 15 seconds should be a sufficient timeout parameter if your customers are generally coming from a LAN, or 30 seconds for modem customers.

20.4.5. Reverse DNS

The server is only given the IP address of the calling browser, but Apache and other servers can then translate the IP address to a fully-qualified hostname using DNS reverse lookup. This hostname is then available to CGI programs and is used in log files. Having a hostname instead of the IP address is convenient, but DNS reverse lookup takes up precious time in the transaction. Furthermore, you don't really need reverse DNS: log file analysis programs (such as the logresolve program that comes with Apache) can look up names offline, and CGI programs can do a reverse lookup themselves if they really need to.

As of Apache 1.3, reverse DNS is off by default. In older versions of Apache, edit the HostnameLookups directive in httpd.conf:

HostnameLookups off

The hazard to DNS is that it uses blocking system calls, which hang the entire server process until the call completes. DNS calls can take a noticeable amount of time for a single user, so a server servicing many users sees a large drag on performance from DNS lookups.

20.4.6. Do Not Restrict by Domain

You can allow and restrict requests from specific domains using the allow from and deny from directives in httpd.conf. However, using allow and deny from domains hurts performance twice. First, a reverse DNS lookup is done to check the domain of the client browser, and then a normal DNS lookup is done to be sure that reverse lookup is not a fake.

Instead of using domain names, use IP addresses with allow and deny, so DNS doesn't get involved.

20.4.7. Set FollowSymLinks

Set the FollowSymLinks option, because this will avoid the lstat system call that would otherwise have to be performed on every element of a path, including a symbolic link every single time you use that link. Here is an example of how to configure it.

DocumentRoot /www/htdocs
<Directory /> 
        Options FollowSymLinks
</Directory>

Beware, however, that this effectively turns off security for symbolic links. This means Unix-level users could then make a link point to any readable file on your server and actually serve that file.

20.4.8. FancyIndexing Off

One problem with Apache is fancy indexing. If FancyIndexing is set to On, then whenever you access a directory lacking an index.html, an HTML listing of the directory contents is generated on the fly and returned to the user. The fancy version of this directory listing uses different icons for different types of files, and assumes you have installed the icons that ship with the server in the /icons directory. If you fail to install these icons, useless network traffic is generated looking for the icons, holding up the rendering of the page. This occurs every time you view the directory page, even if you set your browser to always use cached content and never check the network, because the missing images aren't in the cache. You could install the icons, but in the spirit of "doing less" (as we proclaimed towards the beginnin g of this chapter), we prefer just to turn FancyIndexing Off.

20.4.9. Use Specific Index Files

Instead of using a wildcard such as:

DirectoryIndex index

use a complete list of options:

DirectoryIndex index.cgi index.pl index.html

where you list the most common choice first.

20.4.10. MaxRequestsPerChild

MaxRequestsPerChild is the number of requests a child process will be allowed to serve before it is killed. The idea is to pre-empt memory leaks in the Apache code and in the system libraries. The default under Apache 1.3.9 seems to be 100, but this is far too low. Set it to 10,000 to avoid much of the overhead of spawning new child processes. Keep an eye on the size of your httpd processes and if they don't seem to grow, you can probably increase it to 100,000 or more safely.

20.4.11. Some Notes on Sizing Apache

For Apache, which handles load by dishing out requests to many child processes, you want to initially start as many processes as the number of simultaneous connections you expect. You specify the number of initial servers with the StartServers directive. A value of 10 is plenty for small sites, but would be inadequate for very busy sites.

Apache pre-spawns the number of processes specified and each process then waits for incoming connections. If you configure too many processes, then the select() call will have too much work to do. If you configure too few, you will find yourself forking at a time you can least afford to. However, as of Apache 1.3, forking rates double every second; that is, one child is forked the first second, then two the second second, then four the third second, and so on. This should be fast enough for most sites to cope with variations in load. The minimum (specified by the MinSpareServers directive) should be the average number of processes, plus a few for variation in load. The maximum should be the maximum the machine can handle, usually determined by memory size.

20.4.12. Using mod_status

If you include mod_status and set Rule STATUS=yes when building Apache, then on every request Apache will perform extra timing calls so that the status report generated will include timings. This slows down performance, but gives you performance data. Take your pick.