Published on the O'Reilly Network
(http://www.oreillynet.com/) http://www.oreillynet.com/pub/a/apache/2000/03/10/log_rhythms.html See this if you're having trouble printing code examples Log Rhythmsby Rael Dornfest03/10/2000 Logs are the pulse of your web server -- the rhythms produced by the comings and goings of your visitors. In this column I'll give you a gentle introduction to Apache web server logs and their place in monitoring, security, marketing, and feedback. Before you go running for the hills, I won't be talking about those mathematical logarithms that gave you a headache in high school. Your web server records visits to your web site in the form of logs, a text file (or files) containing entries corresponding to each request (or "hit"). At first glance, logs may look convoluted, but they're actually quite simple. Once you're familiar with the notation, you'll be reading your logs as easily as your daily journal. "One Hit Wonder" or Lasting Impression?Before we dive in, let's get our terminology straight. Hit Hit counts lost their effectiveness as people began to add gratuitous images to their pages in order to inflate their sites' perceived popularity. Hit counts are, however, useful to server administrators as a simplistic traffic or server-utilization indicator. Page View Content providers track page view counts to figure out which content is most interesting to their audience. For example, say an article on Internet marketing generated 1,024 page views, whereas another on door-to-door sales generated only 42. One could reasonably guess the site's audience is far more interested in marketing than sales (at least the door-to-door kind). As another example, let's assume my article is spread across four pages with "next page" links at the bottom of the first three. A particularly telling page view spread would be: Page 1 (456 views), Page 2 (345 views), Page 3 (93 views), and Page 4 (12 views). I would conclude that my audience, while interested in the topic overall, lost interest in my article somewhere in the second page. Marketeers like to use page view counts as popularity indicators. But the assumption that each page view equals a unique person is almost certainly incorrect. For example, 100 page views could either signify 100 people visiting the page once, or one person visiting the page 100 times. Unique Host vs. Unique Visitor A visit from a unique host doesn't necessarily equal a visit from a unique visitor. Perhaps the host in question is a computer sitting in a public library; through the course of a day, several users of that computer may visit the same site or even the same page (think Yahoo). Then there's the issue of the dynamic host. When you dial into your Internet service provider (ISP) via modem, your computer's unique identifier (IP address) is, in all probablity, assigned dynamically. If you hang up and dial in again, there's no guarantee that you'll receive the same identifier. So, what looks like a unique host in your log file may actually be several visitors who just happen to have been allocated the same IP address at different times. The bottom line is this: hits, page views, and host visits only give you a general picture of your web site's visitors and traffic patterns. The generally agreed upon way to properly tag and track a unique user is to use cookies (or "magic cookies"), snippets of identifying information that are sent right along with the user's request and server's response. For more information about cookies, visit the Resources section at the end of this article. Impression
The Access LogLet's see what's lurking inside that log.
For the purposes of this look at a typical set of logs, I'm
assuming your Apache server has been configured to use
Common Log Format (CLF), the default in a fresh Apache installation. Your
Look at your access log, the location of which will
depend upon your layout preferences and installation method.
The Apache 1.3.9 RPM installation under Red Hat 6.1 places logs in an
Let's zoom in on one fairly representative line in a log:
Logging in Apache (version 1.2 and later) is handled by the Apache module,
mod_log_config,
which enables you to customize how your logs look and work. Your
Each log format starts out with the LogFormat directive, followed by a string of tokens that describe how each line of the log file should look, and ending with a nickname given to the format. Click here for a comprehensive list of tokens and their meanings. How you want your logs displayed and into how many files you want them sorted is up to you. Some site authors separate log files into referrer and agent logs. I prefer to use the "combined" log format and keep everything in one place. Let's say I wish to use "common" log format, but also want to keep track
of who is linking to my site. I could just use "combined" format, but
I don't really care what type of browser (agent) my visitor is using.
Instead, I'll create a new
Now that I've defined my preferred log format, I need to tell Apache to use this format. Using my "commonish" log format above:
where We've only just scratched the surface of log customization. For much more, be sure to read the detailed mod_log_config documentation.
Bump on a Log: The Error LogIn addition to access logs, Apache notes unusual server activity
in an error log. In your
The first line tells Apache where to log errors. The second line sets the
threshold for what types of errors to log. The default The contents of the Apache error log are pretty clear. For example,
someone requesting an HTML document,
Restarting your Apache server generates:
The error log is a very useful tool for:
ResourcesThe following is a list of starting points from which to explore further some of the topics covered in this article.
Tune in Next Time...Apache and mod_perl, RPM-Style. Return to Related Articles from the O'Reilly Network .
oreillynet.com Copyright © 2003 O'Reilly & Associates, Inc. |
|