How to split log file in bash based on time condition

问题

I have simple log file with timestamp in milliseconds like this one:

[02/03/2020 07:53:58.859000] 5
[02/03/2020 07:53:58.935300] 2
[02/03/2020 10:04:50.355600] 0
[02/03/2020 10:04:51.028900] 1
[02/03/2020 10:38:13.468200] 6

And I want to achieve better readability so +-2seconds separate logs by dashes like this one:

[02/03/2020 07:53:58.859000] 5
[02/03/2020 07:53:58.935300] 2
------------------------------
[02/03/2020 10:04:50.355600] 0
[02/03/2020 10:04:51.028900] 1
------------------------------
[02/03/2020 10:38:13.468200] 6

How to achieve it by simple loop in bash script? So far I figured out how to format and modify date from string NEW_VALUE1="$(date -d "$VALUE 2 seconds" +'%d/%m/%Y %H:%M:%S')" but with no luck to implement it to functional result.

回答1:

With GNU awk:

awk -F'[[/:. ]' '
  { t=mktime($4" "$3" "$2" "$5" "$6" "$7) }
  NR>1 && t>tlast+2 { print "------------------------------" }; 1
  { tlast=t }
' file

Use [, /, : . and the space character as field separator characters and create a timestamp t for each line.
Print a separator line if this is not the first line and if t > tlast + 2.
Print the current line.
Assign value of t to tlast.

回答2:

Convince yourself from the following (or point me why I'm wrong):

Given two consecutive lines, the 2nd (call it y) belongs to the same section of the 1st (call it x) if they both match until the last : and

If s(x) is even, then s(y) lies in the interval [s(x), s(x)+1].
If s(x) is odd, then s(y) lies in the interval [s(x)-1, s(x)].

where s(x) is the floor of the seconds number of line x. E.g., for the top line provided s(x)=58. The next line should be in the same section, because the string is the same up to the last colon and s(y)=58 ∈ [58,59]

Then you have this awk script:

awk -F: '
    !((int($3)==i1 || int($3)==i2) && min==$2 && datehour==$1) {print "----";}
    {
        sec=int($3)
        min=$2
        datehour=$1
        if (sec % 2 == 0) {i1=sec;i2=sec+1}
        else {i1=sec-1;i2=sec}
        print
    }
' logfile

回答3:

First, if you have GNU awk or mawk, use the awk solution with mktime, it will be much faster than looping in a bash script. However, you have asked for a bash solution, and it is fine if you are dealing with less than a thousand lines or so.

The way to simplify adding the separators is to convert the date to seconds since epoch. Then to check if a separator is needed it is a simple matter of adding 2 to the last date stamp. If not the first line, output the separator and update the current seconds to seconds since epoch. Output the line read from the file each iteration regardless.

You can write this fairly easily reading each line from the file with read and then using the parameter expansions for substring removal to trim from the right ']' through end and the left through '[' leaving only the date which can be used with date -d as you have attempted. You can do something similar to:

secs=0      # initialize seconds zero

while read -r line; do              # read each line in log
    dstr="${line%]*}"               # trim from right through ']'
    dstr="${dstr#*[}"               # trim from left through '['
    epoch=$(date -d "$dstr" +%s)    # get seconds from epoch from date
    if (((epoch-secs) > 2)); then   # if current date 2 greater than secs
        # if not first line, output the separator
        ((secs > 0)) && printf -- "------------------------------\n"
        secs="$epoch"               # update secs to epoch
    fi
    echo "$line"                    # output each line
done < file

Example Use/Output

With your input in file you get:

[02/03/2020 07:53:58.859000] 5
[02/03/2020 07:53:58.935300] 2
------------------------------
[02/03/2020 10:04:50.355600] 0
[02/03/2020 10:04:51.028900] 1
------------------------------
[02/03/2020 10:38:13.468200] 6

While done relatively simply with date -d and comparisons, for large logs, awk (if you have GNU awk or mawk), then using mktime will be orders of magnitude faster than a shell script solution.

来源：https://stackoverflow.com/questions/60498465/how-to-split-log-file-in-bash-based-on-time-condition

标签

bash

loops

sorting