Multiple line, repeated occurence matching

问题

I refer to below question, but with a bit difference. I need to only get line that has "abc" when there is "efg" matching at different line. And I only need the latest matched "abc" line before "efg" is matched...

How to find patterns across multiple lines using grep?

blah blah..
blah blah..
blah abc blah1
blah blah..
blah blah..
blah abc blah2
blah blah..
blah efg1 blah blah
blah efg2 blah blah
blah blah..
blah blah..

blah abc blah3
blah blah..
blah blah..
blah abc blah4
blah blah..
blah blah blah

blah abc blah5
blah blah..
blah blah..
blah abc blah6
blah blah..
blah efg3 blah blah

blah efg4 blah blah
blah abc blah7
blah blah..
blah blah..
blah abc blah8
blah blah..

Expected output

blah abc blah2
blah abc blah6

回答1:

This might work for you (GNU sed):

sed -n '/abc/h;/efg/!b;x;/abc/p;z;x' file

Store the latest abc line in the hold space (HS). When encountering a line containing efg, switch to the HS and if that line contains abc print it.

回答2:

I can see how to do this in two steps, one to identify the blocks of abc ... efg clusters, but with multiple of the former. Second step is to strip down to the two lines that matter.

Important: make sure there are no pairs of empty lines in the input \n\n, as that will break the perl step.

grep -Pzo '(.*abc.*)\n(.*\n)*?(.*efg.*\n)' text | perl -0777 -pe 's/(.+\n)*(.*abc.*\n)(.+\n)*?(.*efg.*\n)\n/$2$4/g'

For example:

grep -Pzo '(.*abc.*)\n(.*\n)*?(.*efg.*\n)' text
blah abc blah1
blah blah..
blah blah..
blah abc blah2
blah blah..
blah efg1 blah blah

blah abc blah3
blah blah..
blah blah..
blah abc blah4
blah blah..
blah blah blah
blah abc blah5
blah blah..
blah blah..
blah abc blah6
blah blah..
blah efg3 blah blah

See how the efg chunks are separated by two newlines? We then remove the cruft that doesn't matter with a perl search-and-replace regex:

$ grep -Pzo '(.*abc.*)\n(.*\n)*?(.*efg.*\n)' text | perl -0777 -pe 's/(.+\n)*(.*abc.*\n)(.+\n)*?(.*efg.*\n)\n/$2$4/g'
blah abc blah2
blah efg1 blah blah
blah abc blah6
blah efg3 blah blah

If you just want the abc line, just include $2 in the replace block (remove $4).

$ grep -Pzo '(.*abc.*)\n(.*\n)*?(.*efg.*\n)' text | perl -0777 -pe 's/(.+\n)*(.*abc.*\n)(.+\n)*?(.*efg.*\n)\n/$2/g'
blah abc blah2
blah abc blah6

来源：https://stackoverflow.com/questions/35616661/multiple-line-repeated-occurence-matching

标签

regex

sed

grep

cygwin