问题
I have a log file with a lot of lines on this format:
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:43:51.008Z] "POST /page/sub2.php?id=alice&jw_token=07e876afdc2245b53214fff0d4763730 HTTP/1.1" 200 275 "-" "alice/7.61.1"
My objective is simple: I want to output Alice's jw_token, and that's it.
So, my logic is that I need to find the lines that include id=alice
and a status code of 200, then return the value of jw_token
.
I actually managed to do this, but only with this absolute monstrosity of a line:
$ grep "id=alice" main.log | grep 200 | grep -o "n=.* " | sed "s/.*=//g" | sed "s/ .*$//g" | uniq
07e876afdc2245b53214fff0d4763730
This looks horrible, and may also break on a number of things (for instance if "200" happens to appear anywhere else on the line). I know grep -P
could have cleaned it up somewhat, but unfortunately that flag isn't available on my Mac.
I also did it by including Python, like this:
cat << EOF > analyzer.py
import re
with open('main.log') as f:
for line in f:
if "id=alice" in line and " 200 " in line:
print(re.search('(?<=jw_token\=).*?(?=\s)', line).group())
break
EOF
python3 analyzer.py && rm analyzer.py
(This was actually MUCH (orders of magnitude) faster than the previous line with grep
and sed
. Why?)
Surely there are ways to make this a lot cleaner and prettier. How?
回答1:
You can achieve this by using just one grep and sed with this command,
grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log|sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'|uniq
Here first part grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log
will filter out all lines not having alice and not having status 200 and next sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'
part will just capture the token in group1 and replace whole line with just the token.
回答2:
Could you please try following, this should be an easy task for awk
in case you are ok with awk
.
awk '
/alice/ && match($0,/jw_token=[^ ]* HTTP\/1\.1\" 200/){
val=substr($0,RSTART+9,RLENGTH-9)
split(val,array," ")
print array[1]
delete array
}' Input_file
回答3:
If you're open to a perl oneliner:
perl -ane '/id=alice&jw_token=([a-f0-9]+).+\b200\b/ && $h{$1}++;END{print"$_\n" for sort(keys %h)}' file.txt
07e876afdc2245b53214fff0d4763730
Explanation:
/ # regex delimiter
id=alice&jw_token= # literally
([a-f0-9]+) # group 1, 1 or more hexa
.+ # 1 or more any character
\b200\b # 200 surrounded with word boundaries
/ # regex delimiter, you may use /i for case insensitive
回答4:
Would you try the following:
grep "id=alice.* 200 " main.log | sed 's/.*jw_token=\([^ ]\{1,\}\).*/\1/' | uniq
来源:https://stackoverflow.com/questions/59332710/effective-grep-of-log-file