I want to get the \"GET\" queries from my server logs.
For example, this is the server log
1.0.0.127.in-addr.arpa - - [10/Jun/2012
It's often easier to use a pipeline rather than a single complex regular expression. This works on the data you provided:
fgrep GET /tmp/foo |
egrep -o 'GET (.*) HTTP' |
sed -r 's/^GET \/(.+) HTTP/\1/'
This pipeline returns the following results:
hello
ss
There are certainly other ways to get the job done, but this patently works on the provided corpus.
In this case since the log file has a known structure, one option is to use cut to pull out the 7th column (fields are denoted by tabs by default).
grep GET log.txt | cut -f 7
I was trying to do this and came across this link: https://www.unix.com/shell-programming-and-scripting/153101-print-next-word-after-found-pattern.html
Summary: use grep to find matching lines, then use awk to find the pattern and print the next field:
grep pattern logfile | \
awk '{for(i=1; i<=NF; i++) if($i~/pattern/) print $(i+1)}'
If you want to know the unique occurrences:
grep pattern logfile | \
awk '{for(i=1; i<=NF; i++) if($i~/pattern/) print $(i+1)}' | \
sort | \
uniq -c
Assuming you have gnu grep, you can use perl-style regex to do a positive lookbehind:
grep -oP '(?<=GET\s/)\w+' file
If you don't have gnu grep, then I'd advise just using sed:
sed -n '/^.*GET[[:space:]]\{1,\}\/\([-_[:alnum:]]\{1,\}\).*$/s//\1/p' file
If you happen to have gnu sed, that can be greatly simplified:
sed -n '/^.*GET\s\+\/\(\w\+\).*$/s//\1/p' file
The bottom line here is, you certainly don't need pipes to accomplish this. grep
or sed
alone will suffice.
gawk '{match($7,/\/(\w+)/,a);} length(a[1]){print a[1]}' log.txt
hello
ss
If you have gawk
then above command will use match
function to select the desired value using regex and storing it to an array a
.
use a pipe if you use grep:
grep -o /he.* log.txt | grep -o [^/].*
grep -o /ss log.txt | grep -o [^/].*
[^/] means extract the letters after ^ symbol from the grep output