9.3. Stateful and Stateless Data
In addition to the format in which the log file presents its data, it is important to think about the contents of these files because what the data represents and how it is represented both contribute to our plan of attack when programming. With log file contents, often a distinction can be made between data that is stateful and data that is stateless. Let's see a couple of examples that will make this distinction clear.
Here's a three-line snippet from an Apache web server log. Each line represents a request answered by the web server:
esnet-118.dynamic.rpi.edu - - [13/Dec/1998:00:04:20 -0500] "GET home/u1/tux/ tuxedo05.gif HTTP/1.0" 200 18666 ppp-206-170-3-49.okld03.pacbell.net - - [13/Dec/1998:00:04:21 -0500] "GET home/u2/news.htm HTTP/1.0" 200 6748 ts007d39.ftl-fl.concentric.net - - [13/Dec/1998:00:04:22 -0500] "GET home/u1/bgc.jpg HTTP/1.1" 304 -
Here are a few lines from a printer daemon log file:
Aug 14 12:58:46 warhol printer: cover/door open Aug 14 12:58:58 warhol printer: error cleared Aug 14 17:16:26 warhol printer: offline or intervention needed Aug 14 17:16:43 warhol printer: error cleared Aug 15 20:40:45 warhol printer: paper out Aug 15 20:40:48 warhol printer: error cleared
In both cases, each line of the log file is independent of every other line in the file. We can find patterns or aggregate lines together gathering statistics, but there's nothing inherent in the data that connects the log file entries to each other.
Now consider some slightly doctored entries from a sendmail mail log:
Dec 13 05:28:27 mailhub sendmail: FAA26690: from=<email@example.com. com>, size=643, class=0, pri=30643, nrcpts=1, msgid=<199812131032.CAA22824@has.a. godcomplex.com>, proto=ESMTP, firstname.lastname@example.org [220.127.116.11] Dec 13 05:29:13 mailhub sendmail: FAA26695: from=<email@example.com>, size=9600, class=0, pri=39600, nrcpts=1, msgid=<199812131029.FAA15005@host.ccs. neu.edu>, proto=ESMTP, firstname.lastname@example.org [18.104.22.168] Dec 13 05:29:15 mailhub sendmail: FAA26690: to=<email@example.com>, delay=00:00:02, xdelay=00:00:01, mailer=local, stat=Sent Dec 13 05:29:19 mailhub sendmail: FAA26695: to="|IFS=' '&&exec /usr/bin/ procmail -f-||exit 75 #user", ctladdr=user (6603/104), delay=00:00:06, xdelay=00: 00:06, mailer=prog, stat=Sent
Unlike the previous examples, there is a definite connection between the lines in the file. Figure 9-2 makes that connection explicit.
Figure 9.2. Related entries in the sendmail log
Each line has at least one partner entry that shows the source and destinations of each message. When a message enters the system it is assigned a unique "Message-ID," highlighted above, which identifies that message while it is in play. This Message-ID allows us to associate related lines in an interleaved log file, essentially giving a message an existence or "state" in between entries of a log file.
Sometimes we care about the "distance" between state transitions. Take, for instance, the wtmp file we saw earlier in this chapter. Not only are we interested in when someone logs in and when they log out (the two state transitions in the log), but in the time between these two events, i.e., how long they were logged in.
The most sophisticated log files can add another twist. Here are some excerpts from a POP (Post Office Protocol) server's log file while the server is in debug mode. The names and IP addresses have been changed to protect the innocent:
Jan 14 15:53:45 mailhub popper: Debugging turned on Jan 14 15:53:45 mailhub popper: (v2.53) Servicing request from "client" at 129.X.X.X Jan 14 15:53:45 mailhub popper: +OK QPOP (version 2.53) at mailhub starting. Jan 14 15:53:45 mailhub popper: Received: "USER username" Jan 14 15:53:45 mailhub popper: +OK Password required for username. Jan 14 15:53:45 mailhub popper: Received: "pass xxxxxxxxx" Jan 14 15:53:45 mailhub popper: +OK username has 1 message (26627 octets). Jan 14 15:53:46 mailhub popper: Received: "LIST" Jan 14 15:53:46 mailhub popper: +OK 1 messages (26627 octets) Jan 14 15:53:46 mailhub popper: Received: "RETR 1" Jan 14 15:53:46 mailhub popper: +OK 26627 octets <message text appears here> Jan 14 15:53:56 mailhub popper: Received: "DELE 1" Jan 14 15:53:56 mailhub popper: Deleting message 1 at offset 0 of length 26627 Jan 14 15:53:56 mailhub popper: +OK Message 1 has been deleted. Jan 14 15:53:56 mailhub popper: Received: "QUIT" Jan 14 15:53:56 mailhub popper: +OK Pop server at mailhub signing off. Jan 14 15:53:56 mailhub popper: (v2.53) Ending request from "user" at (client) 129.X.X.X
Not only do we encounter connections ("Servicing request from...") and disconnections ("Ending request from..."), but we have information detailing what took place in between these state transitions.
Each of these middle events also provides potentially useful "distance" information. If there was a problem with our POP server, we might look to see how long each step in the above output took.
In the case of an FTP server, you may be able to draw some conclusions from this data about how people interact with your site. On average, how long do people stay connected before they transfer files? Do they pause between commands for a long time? Do they always travel from one part of your site to another before downloading the same file? The interstitial data can be a rich source of information.
Copyright © 2001 O'Reilly & Associates. All rights reserved.