问题
Hi I have the following log file structure:
####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
####<20-Jan-2015 07:16:43 o'clock UTC> <Notice> <Stdout> <example2.com>
####<21-Jan-2015 07:16:48 o'clock UTC> <Notice> <Stdout> <example3.com>
How can I filter this file by a date interval, for example: Show all data between 19'th and 20'th of January 2015
I tried to use awk
but I have problems converting 19-Jan-2015
to 2015-01-19
to continue comparison of dates.
回答1:
For an oddball date format like that, I'd outsource the date parsing to the date
utility.
#!/usr/bin/awk -f
# Formats the timestamp as a number, so that higher numbers represent
# a later timestamp. This will not handle the time zone because date
# can't handle the o'clock notation. I hope all your timestamps use the
# same time zone, otherwise you'll have to hack support for it in here.
function datefmt(d) {
# make d compatible with singly-quoted shell strings
gsub(/'/, "'\\''", d)
# then run the date command and get its output
command = "date -d '" d "' +%Y%m%d%H%M%S"
command | getline result
close(command)
# that's our result.
return result;
}
BEGIN {
# Field separator, so the part of the timestamp we'll parse is in $2 and $3
FS = "[< >]+"
# start, end set here.
start = datefmt("19-Jan-2015 00:00:00")
end = datefmt("20-Jan-2015 23:59:59")
}
{
# convert the timestamp into an easily comparable format
stamp = datefmt($2 " " $3)
# then print only lines in which the time stamp is in the range.
if(stamp >= start && stamp <= end) {
print
}
}
回答2:
If the name of the file is example.txt, the the below script should work
for i in `awk -F'<' {'print $2'} example.txt| awk {'print $1"_"$2'}`; do date=`echo $i | sed 's/_/ /g'`; dunix=`date -d "$date" +%s`; if [[ (($dunix -ge 1421605800)) && (($dunix -le 1421778599)) ]]; then grep "$date" example.txt;fi; done
The script just converts the time provided in to unix timestamp, then compares the time and print the lines that meets the condition from the file.
回答3:
Using string comparisons jwill be faster than creating date objects:
awk -F '<' '
{split($2, d, /[- ]/)}
d[3]=="2015" && d[2]=="Jan" && 19<=d[1] && d[1]<=20
' file
回答4:
Another way using mktime all in awk
awk '
BEGIN{
From=mktime("2015 01 19 00 00 00")
To=mktime("2015 01 20 00 00 00")
}
{Time=0}
match($0,/<([^ ]+) ([^ ]+)/,a){
split(a[1],b,"-")
split(a[2],c,":")
b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
}
Time<To&&Time>From
' file
Output
####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
How it works
BEGIN{
From=mktime("2015 01 19 00 00 00")
To=mktime("2015 01 20 00 00 00")
}
Before processing the lines set the dates To and From where the data we want will be between the two.
This format is required for mktime
to work.
The format is YYYY MM DD HH MM SS
.
{time=0}
Reset time so further lines that don't match are not printed
match($0,/<([^ ]+) ([^ ]+)/,a)
Matches the first two words after the <
and stores them in a.
Executes the next block if this is successful.
split(a[1],b,"-")
split(a[2],c,":")
Splits the date and time into individual numbers/Month.
b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
Converts month to number using the fact that all of them are three characters and then dividing by 3.
Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
makes time with collected values
Time<To&&Time>From
if the time is more than From
and less than To
it is inside the desired range and the default action for awk is to print.
Resources
https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html
来源:https://stackoverflow.com/questions/28275880/how-to-filter-data-between-2-dates-with-awk-in-a-bash-script