问题
I have log files of size of the order of several 100 MBs, containing lines like this, containing the date-time information in the beginning:
[Tue Oct 4 11:55:19 2016] [hphp] [25376:7f5d57bff700:279809:000001] [] \nFatal error: syntax error, unexpected T_ENCAPSED_AND_WHITESPACE, expecting ')' in /var/cake_1.2.0.6311-beta/app/webroot/openx/www/delivery/postGetAd.php(12479)(62110d90541a84df30dd077ee953e47c) : eval()'d code on line 1
I have a plugin (nagios check_logwarn) to print out only those lines which contain some of the error strings. Following is the command to run it:
/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161003.log "^.*Fatal error*"
I want to filter out further, based on the date-time, i.e., all the lines which are after, say, 11:55:10.
I am not sure whether to use regex for this. Following is what I have so far:
/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161003.log "^.*Fatal error*" | grep "15\:19\:1*"
But this will only filter those logs whose time is in the 19th minute of the 15th hour.
Update
I am now able to compare the time part of the date-time.
/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161004.log "^.*Fatal error*" | awk '$4 > "14:22:11"'
How do I compare the day part?
Update 2 - opening bounty
I am having to open a bounty because I do not have much expertise with shell and I need a solution soon.
I am stuck at the part of comparing the dates. With The solution https://stackoverflow.com/a/39856560/351903, I am facing this problem. If that is fixed, I would be happy.
I am also open to some enhancement to this (I don't mind if the output has some jumbled up order of logs) -
/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161004.log "^.*Fatal error*" | awk '$4 > "14:22:11"'
I looked for some date-time to timestamp comparison, but couldn't find something working.
I am not able to proceed from what is given in this question. I cannot see the timestamp value using this -
echo date -d '06/12/2012 07:21:22' +"%s"
Not sure what am I missing.
回答1:
You Need Comparable Date Representations
Regular expressions are okay for extracting data, but a terrible way to compare dates to one another. You actually need to convert your timestamps to something comparable, such as Epoch time or DateTime objects. If you want to find all the lines that contain a timestamp greater than some other timestamp, you need to parse out the timestamp in each line for comparison.
A Ruby Example
#!/usr/bin/env ruby
require 'date'
# Convert your given timestamp to something comparable.
timestamp = DateTime.parse ARGV.first
# Loop over each line of your logfile.
File.open(ARGV.last).each_line do |line|
# Use a rather naive regex to extract the timestamp from each line.
next if line !~ /^\[.*?\]/
# Print lines that contain a later timestamp than your target.
puts line if DateTime.parse($&) > timestamp
end
The script takes two positional arguments:
- A timestamp that resembles RFC 2822, with or without a time zone offset.
- A file to parse.
The script then compares the timestamp on each line, and only prints lines that are earlier than the timestamp passed as an argument. You can modify the comparison from >
to >=
if you really mean "later than or equal to" your given timestamp, which may be more intuitive.
For example:
ruby /tmp/parse_log_dates.rb "Tue Oct 4 11:55:18 2016" /path/to/logfile
works just fine on the very limited corpus you provided. Your real-world results may vary, especially if your log files don't actually contain a timestamp on each line.
回答2:
This uses a reference timestamp and compares the timestamp from the log file to it; if the log file's time stamp is more recent, the line gets printed:
awk -v refdate="$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')" -F "[][]" '
{
cmd = "date +\047%s\047 -d \"" $2 "\""
if ((cmd | getline val) > 0) {
if (val > refdate)
print
}
close(cmd)
}
' infile
Here is how it works:
-v refdate="$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')"
converts the date given (our reference date) to seconds since the epoch.-F "[][]"
sets the field separator to square brackets, so the timestamp we want is simply$2
."date +\047%s\047 -d \"" $2 "\""
is the shell command we'd like to execute; it becomesdate +'%s' -d "$2"
, i.e., it converts the log file timestamp to seconds since the epoch.\047
is a single quote.command | getline val
evaluatescommand
and assigns the result toval
, soval
now holds the timestamp from the log file in seconds since the epoch.- We check the success of
getline
with(cmd | getline val) > 0
.
- We check the success of
- If
getline
was successful,if (val > refdate) print
compares the log file timestamp to the reference date and, if the log file timestamp is more recent, prints the line. close(cmd)
closes the pipeline.
References
date -d
is very flexible and understands a lot of formats in the date string, see the date manual.getline
in the gawk user manual and on freeshell.org (hat tip Ed Morton, who also pointed out how to properly usegetline
in his helpful comment)
来源:https://stackoverflow.com/questions/39853960/parsing-lines-from-a-log-file-containing-date-time-greater-than-something