Squid Users Guide

Log analysis and stats	_	_	_
INDEX BACK NEXT	_	_	_

Introduction

Squid (in it's default configuration) makes 4 logfiles.

/usr/local/squid/logs/access.log
/usr/local/squid/logs/cache.log
/usr/local/squid/logs/store.log
/usr/local/squid/cache/log

The first three of these logfiles can be safely be cycled. Although the file in the cache directory is called a log, it is actually an index of each object on disk (when squid is started it reads this file, rather than reading each and every object in the cache), and cannot thus be cycled, though it does get smaller if you send squid a 'kill -HUP'

More info on the format of the logs can be found here, or in the Release Notes included in the distribution (these are the release notes for version 1.1, you may want to check in the doc directory in the source if you have a different version.)

Quite often, the logs can start taking a lot of space... due to the way the unix filesystem works, simply deleting the files (while squid is writing to them) will not free the space until all programs close the file. You can get squid to close the file by sending it a 'kill -USR1'. Squid will then move the current access.log, cache.log and store.log to access.log.0, cache.log.0 and store.log.0. If these files exist, however, it will move the old access.log.0 to access.log.1 and then create a new access.log.0 with the current data-set. It will increase the number at the end until it reaches the value set in the config file for 'logfile_rotate'. Once the file has been moved, and the new log is being written, you can either analyse the logs or simply delete them.

Now we know what logfiles there are, let's see what we can do with them:

logs/access.log
Useful, in here are the requests issued to our proxy. From this file we can find out how many people use our cache, how much each one requested, what pages are most popular, etc.
logs/cache.log
In this file we'll find info Squid wants us to know. Errors, startup messages, etc. are logged in here. It's worth examining this file often
logs/store.log
This file shows what's happening with our cache diskwise. It shows whenever an object is added or removed from disk.
cache/log
This file contains the mapping of objects (saved pages, for example) to their location on the disk. (normally this is only used if squid restarts, since it duplicates almost all of this in ram)

Tools

Analysis of the file access.log can be done with several tools.

Original NLANR scripts (also contains a list of scripts like this)
usage:
The file "report.txt" then contains all the relevant information. Here is a version that creates html output that also works with the netscape cache logs.
Calamaris
usage: calamaris.pl < access.log > stats.html
squidclients
usage: squidclients -H < access.log > clients.html
squidtimes
usage: squidtimes < access.log > times.html
pwebstats
usage: see webpage
PY_Squid_Stats
usage: see webpage of PY_Squid_Stats
Many more not yet listed here. Mail them

Analysis

These may not give you the stats that you need, or they may give you too many stats, and take large amounts of time to work them out. I can only suggest that you change them slightly (this is the main advantage of free software :)

Not everyone expects the same thing from their proxy - the most common usage is to save network traffic, though some people use it as an access-control system. Some of the above utilities will give you only a portion of these.

You should run the analysis scripts on a different machine to your cache server, as squid seems to get very touchy if it's disk throughput is very slow. It's probably best to set up a script that copies the logs to another machine in the middle of the night (using something like scp -c none cache:/usr/local/squid/access.log.0 . Don't use rcp, since it has 0 security)

Note also that some of the utilities above give you information about 'the average time to complete a request'. This is actually a little misleading, since when squid sends data to a client it normally puts the data in a kernel buffer, the contents of which the kernel then handles as it transmits the data to the client. In most cases this buffer is larger than the actual object being sent, and if it comes from the disk cache it will seem to take a very short time to send, since once it's in a kernel buffer, squid has no idea how long it takes for the kernel to send the data. If you want to know how loaded your cache is, make a query from a completely unloaded machine to the cache for a page that is 'close' to the cache network wise (such as your local web server), and then do a query to the same server directly. Check the difference in latency then to see if the cache is slowing down the connections. Check the performance section for more details.

The Squid Users guide is copyright Oskar Pearson oskar@is.co.za This page is copyright Mark Visser and Oskar Pearson

If you like the layout (I do), I can only thank William Mee, and hope he forgives me for stealing it. This section was almost entirely contributed by Mark Visser (mark@cal026031.student.utwente.nl). Thanks to Mark!