search (e.g. awk, grep, sed) for string, then look for X lines above and another string below

问题

I need to be able to search for a string (lets use 4320101), print 20 lines above the string and print after this until it finds the string

For example:

Random text I do not want or blank line
16 Apr 2013 00:14:15
id="4320101"
</eventUpdate>
Random text I do not want or blank line

I just want the following result outputted to a file:

16 Apr 2013 00:14:15
id="4320101"
</eventUpdate>

There are multiple examples of these groups of text in a file that I want.

I tried using this below:

cat filename | grep "</eventUpdate>" -A 20 4320101 -B 100 > greptest.txt

But it only ever shows for 20 lines either side of the string.

Notes:
- the line number the text is on is inconsistent so I cannot go off these, hence why I am using -A 20.
- ideally I'd rather have it so when it searches after the string, it stops when it finds and then carries on searching.

Summary: find 4320101, output 20 lines above 4320101 (or one line of white space), and then output all lines below 4320101 up to

</eventUpdate>

Doing research I am unsure of how to get awk, nawk or sed to work in my favour to do this.

回答1:

This might work for you (GNU sed):

sed ':a;s/\n/&/20;tb;$!{N;ba};:b;/4320102/!D;:c;n;/<\/eventUpdate>/!bc' file

EDIT:

:a;s/\n/&/20;tb;$!{N;ba}; this keeps a window of 20 lines in the pattern space (PS)
:b;/4320102!D; this moves the above window through the file until the pattern 4320102 is found.
:c;n;/<\/eventUpdate>/!bc the 20 line window is printed and any subsequent line until the pattern <\/eventUpdate> is found.

回答2:

Here is an ugly awk solution :)

awk 'BEGIN{last=1}
{if((length($0)==0) || (Random ~ $0))last=NR} 
/4320101/{flag=1;
if((NR-last)>20) last=NR-20;
cmd="sed -n \""last+1","NR-1"p \" input.txt";
system(cmd);
}
flag==1{print}
/eventUpdate/{flag=0}' <filename>

So basically what it does is keeps track of the last blank line or line containing Random pattern in the last variable. Now if the 4320101 has been found, it prints from that line -20 or last whichever is nearer through a system sed command. And sets the flag. The flag causes the next onwards lines to be printed till eventUpdate has been found. Have not tested though, but should be working

回答3:

Look-behind in sed/awk is always tricky.. This self contained awk script basically keeps the last 20 lines stored, when it gets to 4320101 it prints these stored lines, up to the point where the blank or undesired line is found, then it stops. At that point it switches into printall mode and prints all lines until the eventUpdate is encountered, then it prints that and quits.

awk '
function store( line ) {
    for( i=0; i <= 20; i++ ) {
        last[i-1] = last[i]; i++;
    };
    last[20]=line;
};
function purge() {
    for( i=20; i >= 0; i-- ) {
        if( length(last[i])==0 || last[i] ~ "Random" ) {
            stop=i;
            break
        };
    };
    for( i=(stop+1); i <= 20; i++ ) {
        print last[i];
    };

};
{
store($0);
if( /4320101/ ) {
    purge();
    printall=1;
    next;
};
if( printall == 1) {
    print;
    if( /eventUpdate/ ) {
        exit 0;
    };
};
}' test

回答4:

Let's see if I understand your requirements:

You have two strings, which I'll call KEY and LIMIT. And you want to print:

At most 20 lines before a line containing KEY, but stopping if there is a blank line.
All the lines between a line containing KEY and the following line containing LIMIT. (This ignores your requirement that there be no more than 100 such lines; if that's important, it's relatively straightforward to add.)

The easiest way to accomplish (1) is to keep a circular buffer of 20 lines, and print it out when you hit key. (2) is trivial in either sed or awk, because you can use the two-address form to print the range.

So let's do it in awk:

#file: extract.awk

# Initialize the circular buffer
BEGIN          { count = 0; }
# When we hit an empty line, clear the circular buffer
length() == 0  { count = 0; next; }
# When we hit `key`, print and clear the circular buffer
index($0, KEY) { for (i = count < 20 ? 0 : count - 20; i < count; ++i)
                   print buf[i % 20];
                 hi = 0;
               }
# While we're between key and limit, print the line
index($0, KEY),index($0, LIMIT)
               { print; next; }
# Otherwise, save the line
               { buf[count++ % 20] = $0; }

In order to get that to work, we need to set the values of KEY and LIMIT. We can do that on the command line:

awk -v "KEY=4320101" -v "LIMIT=</eventUpdate>" -f extract.awk $FILENAME

Notes:

I used index($0, foo) instead of the more usual /foo/, because it avoids having to escape regex special characters, and there is nowhere in the requirements that regexen are even desired. index(haystack, needle) returns the index of needle in haystack, with indices starting at 1, or 0 if needle is not found. Used as a true/false value, it is true of needle is found.
next causes processing of the current line to end. It can be quite handy, as this little program shows.

回答5:

You can try something like this -

awk '{ 
    a[NR] = $0
}

/<\/eventUpdate>/ { 
    x = NR
}

END {
    for (i in a) {
        if (a[i]~/4320101/) {
            for (j=i-20;j<=x;j++) {
            print a[j]
            }
        }
    }
}' file

回答6:

The simplest way is to use 2 passes of the file - the first to identify the line numbers in the range within which your target regexp is found, the second to print the lines in the selected range, e.g.:

awk '
NR==FNR {
    if ($0 ~ /\<4320101\>/ {
        for (i=NR-20;i<NR;i++)
            range[i]
        inRange = 1
    }
    if (inRange) {
        range[NR]
    }
    if ($0 ~ /<\/eventUpdate>/) {
        inRange = 0
    }
    next
}
FNR in range
' file file

来源：https://stackoverflow.com/questions/16694469/search-e-g-awk-grep-sed-for-string-then-look-for-x-lines-above-and-another

标签

bash

sed

awk

grep

nawk