Squid Users Guide

The Basics	_	_	_
INDEX BACK NEXT	_	_	_

Concepts behind internet caching

Some questions should come to mind as to how useful caching can be, and when objects should and should not be cached. It is totally inappropriate to cache (for example) credit card numbers, the results of scripts executed on remote servers, sites that change often (like www.cnn.com) or even sites that simply don't want to be cached.

Squid handles these circumstances elegantly (but needs the remote sites to work according to the standards, of course).

Executable cgi-bin scripts are not cached, pages that return the correct headers are cached for limited periods of time, and you can specify extra rules as to what, and what not, to cache, and for how long to cache it.

As to how useful caching can be, especially considering the size of the Internet, is a different matter. With a small cache (a couple of gigs of disk space) the returns are remarkably high (up to 25%). This space catches the very often accessed sites, such as netscape, cnn and some of the other biggies. If you double the disk space, though, your don't double your hit rate. This is because you then start trying to catch the rest of the net, which is generally huge and seldom accessed. A very large cache, 20 gigs or so, will probably still get less than 50%, unless you are very agressive as to how long you keep data (normally you can't fill 20 gigs of disk space because pages become stale too soon, and are deleted.

Whenever we refer to an object in this guide we are actually refering to a saved web page or other such downloadable page (a ftp file or directories' contents is also called an object)

The Squid Users guide is copyright Oskar Pearson oskar@is.co.za

If you like the layout (I do), I can only thank William Mee and hope he forgives me for stealing it