$ gawk 'BEGIN { print strftime("Today is %A, %B %d, %Y") }'
Today is Sunday, May 05, 1996
The list of available formats is quite long.
See your local strftime(3) manpage, and the gawk documentation
for the full list.
Our hypothetical CGI log file might be processed by this program:
# cgiformat --- process CGI logs
# data format is user:host:timestamp
#1
BEGIN { FS = ":"; SUBSEP = "@" }
#2
{
# make data more obvious
user = $1; host = $2; time = $3
# store first contact by this user
if (! ((user, host) in first))
first[user, host] = time
# count contacts
count[user, host]++
# save last contact
last[user, host] = time
}
#3
END {
# print the results
for (contact in count) {
i = strftime("%y-%m-%d %H:%M", first[contact])
j = strftime("%y-%m-%d %H:%M", last[contact])
printf "%s -> %d times between %s and %s\n",
contact, count[contact], i, j
}
}
The first step is to set FS to ":" to split the field correctly.
We also use a neat trick and set the subscript separator to "@", so that
the arrays become indexed by "user@host" strings.
In the second step, we look to see if this is the first time we've seen
this user. If so (they're not in the first array), we add them.
Then we increment the count of how many times they've connected. Finally
we store this record's timestamp in the last array. This element
keeps getting overwritten each time we see a new connection by the user.
That's OK; what we will end up with is the last (most recent) connection
stored in the array.
The END procedure formats the data for us.
It loops through the count
array, formatting the timestamps in the first and last arrays
for printing.
Consider a log file with the following records in it.
$ cat /var/log/cgi/querylog
arnold:some.domain.com:831322007
mary:another.domain.org:831312546
arnold:some.domain.com:831327215
mary:another.domain.org:831346231
arnold:some.domain.com:831324598
Here's what running the program produces:
$ gawk -f cgiformat.awk /var/log/cgi/querylog
mary@another.domain.org -> 2 times between 96-05-05 12:09 and 96-05-05 21:30
arnold@some.domain.com -> 3 times between 96-05-05 14:46 and 96-05-05 15:29