An easy way to diff log files, ignoring the time stamps?

后端 未结 5 1789
终归单人心
终归单人心 2021-02-01 13:41

I need to diff two log files but ignore the time stamp part of each line (the first 12 characters to be exact). Is there a good tool, or a clever awk command, that could help m

5条回答
  •  轮回少年
    2021-02-01 14:06

    Answers using cut are fine but sometimes keeping timestamps within the diff output is appreciable. As the OP's question is about ignoring the time stamps (not removing them), I share here my tricky command line:

    diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
    
    • sed isolates the timestamps (# before and \n after) within a process substitution
    • diff -I '^#' ignores lines having these timestamps (lines beginning by #)

    example

    Two log files having same content but different timestamps:

    $> for ((i=1;i<11;i++)) do echo "09:0${i::1}:00.000 data $i"; done > 1.log
    $> for ((i=1;i<11;i++)) do echo "11:00:0${i::1}.000 data $i"; done > 2.log
    

    Basic diff command line says all lines are different:

    $> diff 1.log 2.log
    1,10c1,10
    < 09:01:00.000 data 1
    < 09:02:00.000 data 2
    < 09:03:00.000 data 3
    < 09:04:00.000 data 4
    < 09:05:00.000 data 5
    < 09:06:00.000 data 6
    < 09:07:00.000 data 7
    < 09:08:00.000 data 8
    < 09:09:00.000 data 9
    < 09:01:00.000 data 10
    ---
    > 11:00:01.000 data 1
    > 11:00:02.000 data 2
    > 11:00:03.000 data 3
    > 11:00:04.000 data 4
    > 11:00:05.000 data 5
    > 11:00:06.000 data 6
    > 11:00:07.000 data 7
    > 11:00:08.000 data 8
    > 11:00:09.000 data 9
    > 11:00:01.000 data 10
    

    Our tricky diff -I '^#' does not display any difference (timestamps ignored):

    $> diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
    $>
    

    Change 2.log (replace data by foo on the 6th line) and check again:

    $> sed '6s/data/foo/' -i 2.log
    $> diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
    11,13c11,13
    11,13c11,13
    < #09:06:00.000
    <  data 6
    < #09:07:00.000
    ---
    > #11:00:06.000
    >  foo 6
    > #11:00:07.000
    

    => timestamps are kept in the diffoutput!

    You can also use the side by side feature using -y or --side-by-side option:

    $> diff -y -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
    #09:01:00.000                   #11:00:01.000
     data 1                          data 1
    #09:02:00.000                   #11:00:02.000
     data 2                          data 2
    #09:03:00.000                   #11:00:03.000
     data 3                          data 3
    #09:04:00.000                   #11:00:04.000
     data 4                          data 4
    #09:05:00.000                   #11:00:05.000
     data 5                          data 5
    #09:06:00.000                 | #11:00:06.000
     data 6                       |  foo 6
    #09:07:00.000                 | #11:00:07.000
     data 7                          data 7
    #09:08:00.000                   #11:00:08.000
     data 8                          data 8
    #09:09:00.000                   #11:00:09.000
     data 9                          data 9
    #09:01:00.000                   #11:00:01.000
     data 10                         data 10
    

    old sed

    If your sed implementation does not support the -r option, you may have to count the twelve dots <(sed 's/^\(............\)/#\1\n/' 1.log) or use another pattern of your choice ;)

提交回复
热议问题