问题
I have simple log file with timestamp in milliseconds like this one:
[02/03/2020 07:53:58.859000] 5
[02/03/2020 07:53:58.935300] 2
[02/03/2020 10:04:50.355600] 0
[02/03/2020 10:04:51.028900] 1
[02/03/2020 10:38:13.468200] 6
And I want to achieve better readability so +-2seconds separate logs by dashes like this one:
[02/03/2020 07:53:58.859000] 5
[02/03/2020 07:53:58.935300] 2
------------------------------
[02/03/2020 10:04:50.355600] 0
[02/03/2020 10:04:51.028900] 1
------------------------------
[02/03/2020 10:38:13.468200] 6
How to achieve it by simple loop in bash script? So far I figured out how to format and modify date from string NEW_VALUE1="$(date -d "$VALUE 2 seconds" +'%d/%m/%Y %H:%M:%S')"
but with no luck to implement it to functional result.
回答1:
With GNU awk
:
awk -F'[[/:. ]' '
{ t=mktime($4" "$3" "$2" "$5" "$6" "$7) }
NR>1 && t>tlast+2 { print "------------------------------" }; 1
{ tlast=t }
' file
- Use
[
,/
,:
.
and the space character as field separator characters and create a timestampt
for each line. - Print a separator line if this is not the first line and if
t > tlast + 2
. - Print the current line.
- Assign value of
t
totlast
.
回答2:
Convince yourself from the following (or point me why I'm wrong):
Given two consecutive lines, the 2nd (call it y
) belongs to the same section of the 1st (call it x
) if they both match until the last :
and
- If
s(x)
is even, thens(y)
lies in the interval[s(x), s(x)+1]
. - If
s(x)
is odd, thens(y)
lies in the interval[s(x)-1, s(x)]
.
where s(x)
is the floor of the seconds number of line x
. E.g., for the top line provided s(x)=58
. The next line should be in the same section, because the string is the same up to the last colon and s(y)=58 ∈ [58,59]
Then you have this awk
script:
awk -F: '
!((int($3)==i1 || int($3)==i2) && min==$2 && datehour==$1) {print "----";}
{
sec=int($3)
min=$2
datehour=$1
if (sec % 2 == 0) {i1=sec;i2=sec+1}
else {i1=sec-1;i2=sec}
print
}
' logfile
回答3:
First, if you have GNU awk
or mawk
, use the awk
solution with mktime
, it will be much faster than looping in a bash script. However, you have asked for a bash solution, and it is fine if you are dealing with less than a thousand lines or so.
The way to simplify adding the separators is to convert the date to seconds since epoch. Then to check if a separator is needed it is a simple matter of adding 2 to the last date stamp. If not the first line, output the separator and update the current seconds to seconds since epoch. Output the line read from the file each iteration regardless.
You can write this fairly easily reading each line from the file with read
and then using the parameter expansions for substring removal to trim from the right ']'
through end and the left through '['
leaving only the date which can be used with date -d
as you have attempted. You can do something similar to:
secs=0 # initialize seconds zero
while read -r line; do # read each line in log
dstr="${line%]*}" # trim from right through ']'
dstr="${dstr#*[}" # trim from left through '['
epoch=$(date -d "$dstr" +%s) # get seconds from epoch from date
if (((epoch-secs) > 2)); then # if current date 2 greater than secs
# if not first line, output the separator
((secs > 0)) && printf -- "------------------------------\n"
secs="$epoch" # update secs to epoch
fi
echo "$line" # output each line
done < file
Example Use/Output
With your input in file
you get:
[02/03/2020 07:53:58.859000] 5
[02/03/2020 07:53:58.935300] 2
------------------------------
[02/03/2020 10:04:50.355600] 0
[02/03/2020 10:04:51.028900] 1
------------------------------
[02/03/2020 10:38:13.468200] 6
While done relatively simply with date -d
and comparisons, for large logs, awk
(if you have GNU awk or mawk
), then using mktime
will be orders of magnitude faster than a shell script solution.
来源:https://stackoverflow.com/questions/60498465/how-to-split-log-file-in-bash-based-on-time-condition