Bottlenecks and performance issues. | _ | _ | _ |
INDEX BACK NEXT | _ | _ | _ |
Bottlenecks and performance tuning. |
Squid NOVM, by comparison, avoids using excess memory by transferring the fetched portions of in-transit objects to disk as quickly as possible. This trades off memory usage against file I/O and file descriptor usage.
Either version of Squid also needs memory to store its meta-object database. This database contains a small (typically around 100 bytes) record describing each object that squid has cached. If your cache has 400000 objects cached, you will need around 40MB of memory simply to store this metadata. For the default configuration (with an average size object of 20kb) you need around 5MB RAM for each 1 GB of cache storage. See the section on Malloc problems and memory usage for more information.
You will find that if your operating system has to page parts of the memory that squid is using due to the ram on the system running low, your cache performance will drop dramatically.
The first decision you need to make when considering the performance of your cache is to choose the appropriate version of Squid. If you don't have a lot of memory to play with, consider NOVM to get the maximum benefit from the memory you do have. If you are serving a large user population or have heavy peaks in accesses, go for Squid Classic to minimise delays. A study by Duane Wessels, author of Squid, has found little substantial difference between the two in terms of performance. Squid 1.2 is supposed to be a 'unified' version, trying to get the best of both versions and combining them. At this stage, however, 1.2 is still in Alpha.
Specific Bottlenecks. |
top
, pstat -s
or
equivalent to look at swap space usage (also have a look at the
system-software page for more commands;
substantial swap usage indicates a problem. Consider switching to Squid
NOVM if you're not running it already. Running low of swap space also
means that your system has less disk space available for network
and disk buffers, so you can slow down on two fronts, with the
kernel not having sufficient resources to push the data that squid
is feeding out, and squid not having enough to push data out in the
first place.Once your cache gets busy, disk I/O can become the bottleneck. Since even a moderate sized cache (handling around 200,000 transactions per day) can easily peak at more than 50 transactions per second, the I/O subsystem can play a significant part in slowing the cache down. There are several solutions to improving the I/O capabilities of your cache server:
cache_dir
parameter. Ideally each cache disk should be on its own SCSI
controller for maximum throughput. On a loaded cache an IDE disk is
not generally a good idea.
Network bottlenecks are unfixable by any means other than upgrading network infrastructure. If you have a lots of sibling or parent caches, multicast ICP may be more efficient.
Some people (the author of most of this document, for one) have had successes with what are often called 'cache-clusters'. These are groups of caches which sit very close together on the network and essentially pretend to be one cache. Here are some of the advantages:
cache A 196.4.160.2
A 196.4.160.8
Note that even with the hosts set up as siblings you will still get
some duplication of objects. This is how it happens:
Object A is on cache1. A user wants to download this object, but in this
case he connects to cache2 (this is because of the random rotation of the
caches). He hits 'shift-reload' on the page though, so the browser tells
cache2 don't give me this from disk. Since this header is present
in the request, cache2 goes directly to the origin server and downloads
another copy, rather than checking with its siblings. There are thus 2
copies, one in cache1 and one in cache2.
Note that this can cause problems when there is an object that you want to force the expiration of (such as a page that you have updated or is corrupt). Hitting shift-reload won't clear the object from EVERY cache, since the next person to come along may hit cache2 when you cleared the object from cache1. Caches querying each other don't download the newest object from their sibling caches, they simply get them from the one that responds fastest. You should use the cachemgr.cgi script to clear the objects from each and every cache, one by one.
Distributed sibling cache machines can become very effective when you use them with the mechanism described in this page. Essentially the auto-config mentioned in that page allows you to split requests to multiple machines with a hash table, meaning that you completely remove all duplication, and all machines will be balanced across equally. This idea is supposedly being included in the new Microsoft proxy.
Here are some URL's for system-performance/filedescriptor enhancement:
Linux
Digital
General from the squid FAQ
The Squid Users guide is copyright Oskar Pearson oskar@is.co.za This page is joint copyright Julian Anderson and Oskar Pearson
If you like the layout (I do), I can only thank William Mee and hope he forgives me for stealing it. This section was almost entirely contributed by Julian Anderson (julian.anderson@mcs.vuw.ac.nz)