Parsing lines from a log file containing date-time greater than something

穿精又带淫゛_ 提交于 2019-12-02 22:47:01

问题


I have log files of size of the order of several 100 MBs, containing lines like this, containing the date-time information in the beginning:

[Tue Oct  4 11:55:19 2016] [hphp] [25376:7f5d57bff700:279809:000001] [] \nFatal error: syntax error, unexpected T_ENCAPSED_AND_WHITESPACE, expecting ')' in /var/cake_1.2.0.6311-beta/app/webroot/openx/www/delivery/postGetAd.php(12479)(62110d90541a84df30dd077ee953e47c) : eval()'d code on line 1

I have a plugin (nagios check_logwarn) to print out only those lines which contain some of the error strings. Following is the command to run it:

/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161003.log "^.*Fatal error*" 

I want to filter out further, based on the date-time, i.e., all the lines which are after, say, 11:55:10.

I am not sure whether to use regex for this. Following is what I have so far:

/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161003.log "^.*Fatal error*" | grep "15\:19\:1*"

But this will only filter those logs whose time is in the 19th minute of the 15th hour.

Update

I am now able to compare the time part of the date-time.

/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161004.log "^.*Fatal error*" | awk '$4 > "14:22:11"'

How do I compare the day part?

Update 2 - opening bounty

I am having to open a bounty because I do not have much expertise with shell and I need a solution soon.

I am stuck at the part of comparing the dates. With The solution https://stackoverflow.com/a/39856560/351903, I am facing this problem. If that is fixed, I would be happy.

I am also open to some enhancement to this (I don't mind if the output has some jumbled up order of logs) -

/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161004.log "^.*Fatal error*" | awk '$4 > "14:22:11"'

I looked for some date-time to timestamp comparison, but couldn't find something working.

I am not able to proceed from what is given in this question. I cannot see the timestamp value using this -

echo date -d '06/12/2012 07:21:22' +"%s"

Not sure what am I missing.


回答1:


You Need Comparable Date Representations

Regular expressions are okay for extracting data, but a terrible way to compare dates to one another. You actually need to convert your timestamps to something comparable, such as Epoch time or DateTime objects. If you want to find all the lines that contain a timestamp greater than some other timestamp, you need to parse out the timestamp in each line for comparison.

A Ruby Example

#!/usr/bin/env ruby

require 'date'

# Convert your given timestamp to something comparable.
timestamp = DateTime.parse ARGV.first

# Loop over each line of your logfile.
File.open(ARGV.last).each_line do |line|
  # Use a rather naive regex to extract the timestamp from each line.
  next if line !~ /^\[.*?\]/

  # Print lines that contain a later timestamp than your target.
  puts line if DateTime.parse($&) > timestamp
end

The script takes two positional arguments:

  1. A timestamp that resembles RFC 2822, with or without a time zone offset.
  2. A file to parse.

The script then compares the timestamp on each line, and only prints lines that are earlier than the timestamp passed as an argument. You can modify the comparison from > to >= if you really mean "later than or equal to" your given timestamp, which may be more intuitive.

For example:

ruby /tmp/parse_log_dates.rb "Tue Oct  4 11:55:18 2016" /path/to/logfile

works just fine on the very limited corpus you provided. Your real-world results may vary, especially if your log files don't actually contain a timestamp on each line.




回答2:


This uses a reference timestamp and compares the timestamp from the log file to it; if the log file's time stamp is more recent, the line gets printed:

awk -v refdate="$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')" -F "[][]" '
    {
        cmd = "date +\047%s\047 -d \"" $2 "\""
        if ((cmd | getline val) > 0) {
            if (val > refdate)
                print
        }
        close(cmd)
    }
' infile

Here is how it works:

  • -v refdate="$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')" converts the date given (our reference date) to seconds since the epoch.
  • -F "[][]" sets the field separator to square brackets, so the timestamp we want is simply $2.
  • "date +\047%s\047 -d \"" $2 "\"" is the shell command we'd like to execute; it becomes date +'%s' -d "$2", i.e., it converts the log file timestamp to seconds since the epoch. \047 is a single quote.
  • command | getline val evaluates command and assigns the result to val, so val now holds the timestamp from the log file in seconds since the epoch.
    • We check the success of getline with (cmd | getline val) > 0.
  • If getline was successful, if (val > refdate) print compares the log file timestamp to the reference date and, if the log file timestamp is more recent, prints the line.
  • close(cmd) closes the pipeline.

References

  • date -d is very flexible and understands a lot of formats in the date string, see the date manual.
  • getline in the gawk user manual and on freeshell.org (hat tip Ed Morton, who also pointed out how to properly use getline in his helpful comment)


来源:https://stackoverflow.com/questions/39853960/parsing-lines-from-a-log-file-containing-date-time-greater-than-something

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!