20.12. Parsing a Web Server Log FileProblemYou want to extract from a web server log file only the information you're interested in. SolutionPull apart the log file as follows: while (<LOGFILE>) { my ($client, $identuser, $authuser, $date, $time, $tz, $method, $url, $protocol, $status, $bytes) = /^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+) "(\S+) (.*?) (\S+)" (\S+) (\S+)$/; # ... } DiscussionThis regular expression pulls apart entries in Common Log Format, an informal standard that most web servers adhere to. The fields are:
Other formats include the referrer and agent information. The pattern needs only minor changes for it to work with other log file formats. Watch out that spaces in the URL field are not escaped. This means that we can't use See AlsoThe CLF spec at http://www.w3.org/Daemon/User/Config/Logging.html |
|