How to filter data between 2 dates with awk in a bash script [duplicate]

问题

Hi I have the following log file structure:

####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
####<20-Jan-2015 07:16:43 o'clock UTC> <Notice> <Stdout> <example2.com>
####<21-Jan-2015 07:16:48 o'clock UTC> <Notice> <Stdout> <example3.com>

How can I filter this file by a date interval, for example: Show all data between 19'th and 20'th of January 2015

I tried to use awk but I have problems converting 19-Jan-2015 to 2015-01-19 to continue comparison of dates.

回答1:

For an oddball date format like that, I'd outsource the date parsing to the date utility.

#!/usr/bin/awk -f

# Formats the timestamp as a number, so that higher numbers represent
# a later timestamp. This will not handle the time zone because date
# can't handle the o'clock notation. I hope all your timestamps use the
# same time zone, otherwise you'll have to hack support for it in here.
function datefmt(d) {
  # make d compatible with singly-quoted shell strings
  gsub(/'/, "'\\''", d)

  # then run the date command and get its output
  command = "date -d '" d "' +%Y%m%d%H%M%S"
  command | getline result
  close(command)

  # that's our result.
  return result;
}

BEGIN {
  # Field separator, so the part of the timestamp we'll parse is in $2 and $3
  FS = "[< >]+"

  # start, end set here.
  start = datefmt("19-Jan-2015 00:00:00")
  end   = datefmt("20-Jan-2015 23:59:59")
}

{
  # convert the timestamp into an easily comparable format
  stamp = datefmt($2 " " $3)

  # then print only lines in which the time stamp is in the range.
  if(stamp >= start && stamp <= end) {
    print
  }
}

回答2:

If the name of the file is example.txt, the the below script should work

 for i in `awk -F'<' {'print $2'} example.txt| awk {'print $1"_"$2'}`; do date=`echo $i | sed 's/_/ /g'`;  dunix=`date -d "$date" +%s`; if [[ (($dunix -ge 1421605800)) && (($dunix -le 1421778599)) ]]; then  grep "$date" example.txt;fi;  done

The script just converts the time provided in to unix timestamp, then compares the time and print the lines that meets the condition from the file.

回答3:

Using string comparisons jwill be faster than creating date objects:

awk -F '<' '
    {split($2, d, /[- ]/)} 
    d[3]=="2015" && d[2]=="Jan" && 19<=d[1] && d[1]<=20
' file

回答4:

Another way using mktime all in awk

awk '

BEGIN{
        From=mktime("2015 01 19 00 00 00")
        To=mktime("2015 01 20 00 00 00")
}
{Time=0}
match($0,/<([^ ]+) ([^ ]+)/,a){
        split(a[1],b,"-")
        split(a[2],c,":")
        b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
        Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
}
Time<To&&Time>From

' file

Output

####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>

How it works

BEGIN{
        From=mktime("2015 01 19 00 00 00")
        To=mktime("2015 01 20 00 00 00")
}

Before processing the lines set the dates To and From where the data we want will be between the two.
This format is required for mktime to work.
The format is YYYY MM DD HH MM SS.

{time=0}

Reset time so further lines that don't match are not printed

match($0,/<([^ ]+) ([^ ]+)/,a)

Matches the first two words after the < and stores them in a. Executes the next block if this is successful.

    split(a[1],b,"-")
    split(a[2],c,":")

Splits the date and time into individual numbers/Month.

b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3

Converts month to number using the fact that all of them are three characters and then dividing by 3.

 Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])

makes time with collected values

Time<To&&Time>From

if the time is more than From and less than To it is inside the desired range and the default action for awk is to print.

Resources

https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html

来源：https://stackoverflow.com/questions/28275880/how-to-filter-data-between-2-dates-with-awk-in-a-bash-script

标签

Linux

bash

date

awk

gnu