The logs are a valuable source of information about Squid workloads and performance. The logs record not only access information, but also system configuration errors and resource consumption (eg, memory, disk space).
There are basically two formats for the access.log file: ``native'' and
``common.'' The
Common Logfile Format
is used by numerous HTTP servers. This format consists of the following
seven fields:
remotehost rfc931 authuser [date] "method URL" status bytes
The native format is different for different major versions of Squid.
For Squid-1.0 it is:
time elapsed remotehost code/status/peerstatus bytes method URL
For Squid-1.1, the information from the hierarchy.log was moved
into access.log. The format is:
time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost
This logfile exists for Squid-1.0 only. The format is
[date] URL peerstatus peerhost
The store.log consists of the following fields:
time The time this entry was logged. The value is the
raw Unix time plus milliseconds.
action One of RELEASE, SWAPIN, or SWAPOUT.
RELEASE means the object has been removed from the cache.
SWAPOUT means the object has been saved to disk.
SWAPIN means the object existed on disk and has been
swapped into memory.
status The HTTP reply code.
The following three fields are timestamps parsed from the HTTP
reply headers. All are expressed in Unix time. A missing header
is represented with -2 and an unparsable header is represented as -1.
datehdr The value of the HTTP Date: reply header.
lastmod The value of the HTTP Last-Modified: reply header.
expires The value of the HTTP Expires: reply header.
type The HTTP Content-Type reply header.
expect-len The value of the HTTP Content-Length reply header.
Zero if Content-Length was missing.
real-len The number of bytes of content actually read. If the
expect-len is non-zero, and not equal to the real-len,
the object will be released from the cache.
method HTTP request method
key The cache key. Often this is simply the URL. Cache objects
which never become public will have cache keys that include
a unique integer sequence number, the request method, and
then the URL.
These are the definitions for the various log format components:
The IP address of the client host. In Squid-1.1, if the log_fqdn option is enabled, full hostnames will be logged when available.
The username associated with the client connection, determined from an Ident (RFC 931) server running on the client host. By default Ident lookups are not made, but may be enabled with the ident_lookup option.
Always NULL ("-") for Squid logs.
GET, HEAD, POST, etc. for HTTP requests. ICP_QUERY for ICP requests.
The requested URL.
The ``cache result'' of the request. This describes if the request was a cache hit or miss, and if the object was refreshed. See the full list of cache result codes.
HTTP status code: 200 for succesful actions, 000 for UDP requests, 403 for redirects, 500 for server errors, etc. See the HTTP status codes for a complete list.
The number of bytes delivered to the client.
A status code that explains how the request was forwarded, either too your peer (neighbor) caches, or directly to the origin server.
The host where the request was forwarded to.
Unix timestamp (since Jan 1, 1970) with millisecond resolution.
HTTP date format:
The time elapsed (milliseconds) during the client connection. For HTTP requests, this is the time between the accept() and close() system calls for the TCP socket. For ICP requests, this represents the time between scheduling the reply message for sending and actually sending it.
Note, TCP_ refers to requests on the HTTP port (3128).
A valid copy of the requested object was in the cache.
The requested object was not in the cache.
] The object was in the cache, but STALE. An If-Modified-Since request was made and a "304 Not Modified" reply was received.
The object was in the cache, but STALE. The request to validate the object failed, so the old (stale) object was returned.
The object was in the cache, but STALE. An If-Modified-Since request was made and the reply contained new content.
The client issued a request with the "no-cache" pragma.
The client issued an If-Modified-Since request and the object was in the cache and still fresh.
The client issued an If-Modified-Since request for a stale object.
The object was believed to be in the cache, but could not be accessed.
Access was denied for this request
"UDP_" refers to requests on the ICP port (3130)
UDP_HIT A valid copy of the requested object was in the cache.
UDP_HIT_OBJ Same as UDP_HIT, but the object data was small enough
to be sent in the UDP reply packet. Saves the
following TCP request.
UDP_MISS The requested object was not in the cache.
UDP_DENIED Access was denied for this request.
UDP_INVALID An invalid request was received.
UDP_RELOADING The ICP request was "refused" because the cache is
busy reloading its metadata.
"ERR_" refers to various types of errors for HTTP requests.
Hierarchy Data Tags
DIRECT The object has been requested from the origin
server.
FIREWALL_IP_DIRECT The object has been requested from the origin
server because the origin host IP address is
inside your firewall.
FIRST_PARENT_MISS The object has been requested from the
parent cache with the fastest weighted round
trip time.
FIRST_UP_PARENT The object has been requested from the first
available parent in your list.
LOCAL_IP_DIRECT The object has been requested from the origin
server because the origin host IP address
matched your 'local_ip' list.
SIBLING_HIT The object was requested from a sibling cache
which replied with a UDP_HIT.
NO_DIRECT_FAIL The object could not be requested because
of firewall restrictions and no parent caches
were available.
NO_PARENT_DIRECT The object was requested from the origin server
because no parent caches exist for the URL.
PARENT_HIT The object was requested from a parent cache
which replied with a UDP_HIT.
SINGLE_PARENT The object was requested from the only
parent cache appropriate for this URL.
SOURCE_FASTEST The object was requested from the origin server
because the 'source_ping' reply arrived first.
PARENT_UDP_HIT_OBJ The object was received in a UDP_HIT_OBJ reply
from a parent cache.
SIBLING_UDP_HIT_OBJ The object was received in a UDP_HIT_OBJ reply
from a sibling cache.
PASSTHROUGH_PARENT The neighbor or proxy defined in the config
option 'passthrough_proxy' was used.
SSL_PARENT_MISS The neighbor or proxy defined in the config
option 'ssl_proxy' was used.
DEFAULT_PARENT No ICP queries were sent to any parent
caches. This parent was chosen because
it was marked as 'default' in the config
file.
ROUNDROBIN_PARENT No ICP queries were received from any parent
caches. This parent was chosen because
it was marked as 'default' in the config
file and it had the lowest round-robin use
count.
CLOSEST_PARENT_MISS This parent was selected because it
included the lowest RTT measurement to
the origin server. This only appears
with 'query_icmp on' set in the config
file.
CLOSEST_DIRECT The object was fetched directly from the
origin server because this cache measured
a lower RTT than any of the parent caches.
Almost any of these may be preceeded by 'TIMEOUT_' if the two-second (default) timeout occurs waiting for all ICP replies to arrive from neighbors.
These are taken from
RFC 2068.
100 Continue
101 Switching Protocols
200 OK
201 Created
202 Accepted
203 Non-Authoritative Information
204 No Content
205 Reset Content
206 Partial Content
300 Multiple Choices
301 Moved Permanently
302 Moved Temporarily
303 See Other
304 Not Modified
305 Use Proxy
400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Time-out
409 Conflict
410 Gone
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request-URI Too Large
415 Unsupported Media Type
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Time-out
505 HTTP Version not supported
This file has a rather unfortuntate name. It also is often called the swap log. It is a record of every cache object written to disk. It is read when Squid starts up to ``reload'' the cache. If you remove this file, you will effectively wipe out your cache contents.
For Squid-1.1, there are six fields:
The best way to maintain Squid log files is to send the squid process a USR1 signal. This causes the current log files to be closed and renamed. You can then remove any of the old log files. For example, if your squid.pid file is /usr/local/squid/logs/squid.pid/ (as defined in your squid.conf file) you would do:
kill -USR1 `cat /usr/local/squid/logs/squid.pid`
NOTE: The logfile_rotate
line in squid.conf makes it generally unnecessary to delete
logfiles by hand. Just set logfile_rotate
to the
number of old logs you want saved. Each time the value of
logfile_rotate
is reached, the oldest log will be
deleted automatically. You may find it useful to simply set
logfile_rotate
to the number of old logs you want,
and then set up a crontab to send squid the SIGUSR1
signal.
The following crontab entry would tell Squid to rotate the logs
every day at midnight:
0 0 * * * /bin/kill -USR1 `cat /usr/local/squid/logs/squid.pid`
The only logfile you should never delete
is the file cleverly named log
which normally exists
in the first cache_dir
directory. This file contains
the meta data needed to rebuild the cache when squid starts up.
Deleting this file effectively wipes out your
cache.
This message means that the requested object was in ``Delete Behind'' mode and the user aborted the transfer. An object will go into ``Delete Behind'' mode if
This means that a timeout occurred while the object was being transferred. Most likely the retrieval of this object was very slow (or it stalled before finishing) and the user aborted the request. However, depending on your settings for quick_abort, Squid may have continued to try retrieving the object. Squid imposes a maximum amount of time on all open sockets, so after some amount of time the stalled request was aborted and logged win an ERR_LIFETIME_EXP message.
I've been asked to retrieve an object which was accidentally destroyed at the source for recovery. So, how do I figure out where the things are so I can copy them out and strip off the headers?
The following method applies only to the Squid-1.1 versions:
Use grep to find the named object (Url) in the cache/log file. The first filed in this file is an integer file number.
Then, find the file fileno-to-pathname.pl from the ``scripts''
directory of the Squid source distribution. The usage is
perl fileno-to-pathname.pl [-c squid.conf]
file numbers are read on stdin, and pathnames are printed on
stdout.