Bottlenecks and performance issues. _ _ _

Bottlenecks and performance tuning.
As of version 1.1, Squid has come in two flavours, Classic Squid and Squid NOVM. The key difference is in the Squid virtual memory system. In Classic Squid, spare memory is be devoted to in-transit objects and to "hot" objects. In-transit objects are pages that are being fetched from remote servers, while "hot" objects are objects that squid has decided are popular enough that a copy is kept in memory to speed up access. Select the VM_Objects page of the cache manager to see the list of in-transit and hot objects.

Squid NOVM, by comparison, avoids using excess memory by transferring the fetched portions of in-transit objects to disk as quickly as possible. This trades off memory usage against file I/O and file descriptor usage.

Either version of Squid also needs memory to store its meta-object database. This database contains a small (typically around 100 bytes) record describing each object that squid has cached. If your cache has 400000 objects cached, you will need around 40MB of memory simply to store this metadata. For the default configuration (with an average size object of 20kb) you need around 5MB RAM for each 1 GB of cache storage. See the section on Malloc problems and memory usage for more information.

You will find that if your operating system has to page parts of the memory that squid is using due to the ram on the system running low, your cache performance will drop dramatically.

The first decision you need to make when considering the performance of your cache is to choose the appropriate version of Squid. If you don't have a lot of memory to play with, consider NOVM to get the maximum benefit from the memory you do have. If you are serving a large user population or have heavy peaks in accesses, go for Squid Classic to minimise delays. A study by Duane Wessels, author of Squid, has found little substantial difference between the two in terms of performance. Squid 1.2 is supposed to be a 'unified' version, trying to get the best of both versions and combining them. At this stage, however, 1.2 is still in Alpha.

Specific Bottlenecks.
The bottlenecks your cache will encounter depend on the size of the cache and browsing habits of your site. The one first encountered is usually lack of memory. Since you can only service proportionally as much disk as you have memory, at the (approximate) 5MB memory / 1GB disk ratio, you can quickly consume all available memory by configuring too much cache disk. If this occurs, your operating system will begin to swap, and eventually, thrash. Squid responds *very* badly to low memory conditions; performance falls dramatically. Use top, pstat -s or equivalent to look at swap space usage (also have a look at the system-software page for more commands; substantial swap usage indicates a problem. Consider switching to Squid NOVM if you're not running it already. Running low of swap space also means that your system has less disk space available for network and disk buffers, so you can slow down on two fronts, with the kernel not having sufficient resources to push the data that squid is feeding out, and squid not having enough to push data out in the first place.

Once your cache gets busy, disk I/O can become the bottleneck. Since even a moderate sized cache (handling around 200,000 transactions per day) can easily peak at more than 50 transactions per second, the I/O subsystem can play a significant part in slowing the cache down. There are several solutions to improving the I/O capabilities of your cache server:

CPU limitations are rarely encountered except in very large caches, unless you have particularly complicated ACL lists. Consider compiling with USE_BIN_TREE if you have many ACLs; the default is to linear-search through the ACL list. Another option is to turn off and compile out all debugging (preliminary profile suggests Squid spends between 10% and 15% of it's cycles on debugging statements).

Network bottlenecks are unfixable by any means other than upgrading network infrastructure. If you have a lots of sibling or parent caches, multicast ICP may be more efficient.

Some people (the author of most of this document, for one) have had successes with what are often called 'cache-clusters'. These are groups of caches which sit very close together on the network and essentially pretend to be one cache. Here are some of the advantages:

Most cache-clusters work as follows:
You have 2 (or more) cache machines set up on the same subnet, set up to talk to one another as siblings. There is one A record in the DNS that points to both IP addresses (that is possible - ask your DNS guru for help here). When you set up a browser it points to the A record that references both machines. (Note that with Netscape version 3 it selects one of the IP addresses that the A record points to and uses that IP only until you restart it). The caches are then set up with each other as siblings, so that when they get a request for an object that they don't have on their disk, they ask the siblings, which then check their disks. If they have the object they reply with an 'Over-here!' response, and the original cache then connects to them and downloads the object. This means that you essentially get a 'distributed disk', and by adding 1 GB of disk to each machine you are essentially adding 2 GB to the overall disk cache which you can use for storage of objects.
In your DNS configuration file: cache A A
Browsers are simply set up to point to ''.
Note that this will only really be useful if you have 2 caches which are identically configured, otherwise the more powerful one won't be used to its full potential. You can't (unfortunately) get around this with adding more occurences of the same IP address to the DNS, since bind strips out duplicates. The slowest cache then becomes the 'limiting-factor'.

Note that even with the hosts set up as siblings you will still get some duplication of objects. This is how it happens:
Object A is on cache1. A user wants to download this object, but in this case he connects to cache2 (this is because of the random rotation of the caches). He hits 'shift-reload' on the page though, so the browser tells cache2 don't give me this from disk. Since this header is present in the request, cache2 goes directly to the origin server and downloads another copy, rather than checking with its siblings. There are thus 2 copies, one in cache1 and one in cache2.

Note that this can cause problems when there is an object that you want to force the expiration of (such as a page that you have updated or is corrupt). Hitting shift-reload won't clear the object from EVERY cache, since the next person to come along may hit cache2 when you cleared the object from cache1. Caches querying each other don't download the newest object from their sibling caches, they simply get them from the one that responds fastest. You should use the cachemgr.cgi script to clear the objects from each and every cache, one by one.

Distributed sibling cache machines can become very effective when you use them with the mechanism described in this page. Essentially the auto-config mentioned in that page allows you to split requests to multiple machines with a hash table, meaning that you completely remove all duplication, and all machines will be balanced across equally. This idea is supposedly being included in the new Microsoft proxy.

Here are some URL's for system-performance/filedescriptor enhancement:
General from the squid FAQ

The Squid Users guide is copyright Oskar Pearson This page is joint copyright Julian Anderson and Oskar Pearson

If you like the layout (I do), I can only thank William Mee and hope he forgives me for stealing it. This section was almost entirely contributed by Julian Anderson (