Effective grep of log file

问题

I have a log file with a lot of lines on this format:

10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:43:51.008Z] "POST /page/sub2.php?id=alice&jw_token=07e876afdc2245b53214fff0d4763730 HTTP/1.1" 200 275 "-" "alice/7.61.1"

My objective is simple: I want to output Alice's jw_token, and that's it.

So, my logic is that I need to find the lines that include id=alice and a status code of 200, then return the value of jw_token.

I actually managed to do this, but only with this absolute monstrosity of a line:

$ grep "id=alice" main.log | grep 200 | grep -o "n=.* " | sed "s/.*=//g" | sed "s/ .*$//g" | uniq
07e876afdc2245b53214fff0d4763730

This looks horrible, and may also break on a number of things (for instance if "200" happens to appear anywhere else on the line). I know grep -P could have cleaned it up somewhat, but unfortunately that flag isn't available on my Mac.

I also did it by including Python, like this:

cat << EOF > analyzer.py
import re

with open('main.log') as f:
    for line in f:
        if "id=alice" in line and " 200 " in line:
            print(re.search('(?<=jw_token\=).*?(?=\s)', line).group())
            break
EOF
python3 analyzer.py && rm analyzer.py

(This was actually MUCH (orders of magnitude) faster than the previous line with grep and sed. Why?)

Surely there are ways to make this a lot cleaner and prettier. How?

回答1:

You can achieve this by using just one grep and sed with this command,

grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log|sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'|uniq

Here first part grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log will filter out all lines not having alice and not having status 200 and next sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/' part will just capture the token in group1 and replace whole line with just the token.

回答2:

Could you please try following, this should be an easy task for awk in case you are ok with awk.

awk '
/alice/ && match($0,/jw_token=[^ ]* HTTP\/1\.1\" 200/){
  val=substr($0,RSTART+9,RLENGTH-9)
  split(val,array," ")
  print array[1]
  delete array
}'  Input_file

回答3:

If you're open to a perl oneliner:

perl -ane '/id=alice&jw_token=([a-f0-9]+).+\b200\b/ && $h{$1}++;END{print"$_\n" for sort(keys %h)}' file.txt
07e876afdc2245b53214fff0d4763730

Explanation:

/                           # regex delimiter
    id=alice&jw_token=      # literally
    ([a-f0-9]+)             # group 1, 1 or more hexa
    .+                      # 1 or more any character
    \b200\b                 # 200 surrounded with word boundaries
/                           # regex delimiter, you may use /i for case insensitive

回答4:

Would you try the following:

grep "id=alice.* 200 " main.log | sed 's/.*jw_token=\([^ ]\{1,\}\).*/\1/' | uniq

来源：https://stackoverflow.com/questions/59332710/effective-grep-of-log-file

标签

regex

logging

grep