bash using awk or sed to search backwards from occurance to a specific string

问题

I've an xml file and am searching for a string in this file. Once (and if) the string is found I need to be able to search back to the position of another string and output the data.

ie:

<xml>
<packet>
 <proto>
 <field show="bob">
 </proto>
</packet>
<packet>
 <proto>
 <field show="rumpelstiltskin">
 </proto>
</packet>
<packet>
 <proto>
 <field show="peter">
 </proto>
</packet>

My input would be known:

show="rumpelstiltskin"

and

<packet>

I need to get the following result (which is basically the second block);

<packet>
<proto>
<field show="rumpelstiltskin">
</proto>
</packet>

<packet>
<proto>
<field show="rumpelstiltskin">

There are thousands of (wireshark pdml conversion) and the show="rumpelstilstkin" can occur anywhere in the file and the section can be of any arbitrary size.

I've done this before and am pretty sure it's possible in an awk or sed oneliner.. any help appreciated!

回答1:

So ... you COULD hack something together that would do basic parsing of your file as a text file...

awk -v txt="rumpel" '$0=="<packet>"{s=$0; found=0; next} $0~txt{found=1} {s=s RS $0} $0=="</packet>" && found {print s}' inp.xml

Broken out into pieces for easier explanation, this does the following:

-v txt="rumpel" - sets a variable for use within the script. Note that this will be evaluated as a regex in this example, but you could use index() if you prefer to search for it as a string.
$0=="<packet>"{s=$0; found=0; next} - If we find the start of a packet, reset our storage variable (s) and flag (found).
$0~txt{found=1} - If we find the text we're looking for, set a flag.
{s=s RS $0} - Append the current line to a variable, and
$0=="</packet>" && found {print s} - if we're at the end of our text and the string was found, print.

A better approach would likely be to interpret the XML using something that understands XML natively, but that isn't possible with just sed and awk.

回答2:

You need to treat your XML as XML and use an appropriate tool. For example, modifying your XML slightly to make it valid:

<xml>
  <packet>
    <proto>
      <field show="bob"/>
    </proto>
  </packet>
  <packet>
    <proto>
      <field show="rumpelstiltskin"/>
    </proto>
  </packet>
  <packet>
    <proto>
      <field show="peter"/>
    </proto>
  </packet>
</xml>

You could use xmllint like this:

xmllint --xpath '//packet[proto/field/@show="rumpelstiltskin"]' file.xml

This matches and prints the contents of all <packet> elements that contain a <field show="rumpelstiltskin"> within a <proto> element.

If you don't want to specify the complete hierarchy, you can use something like this instead:

xmllint --xpath '//packet[descendant::field[@show="rumpelstiltskin"]]' file.xml

回答3:

You could do this with grep

cat file | grep 'show="rumpelstiltskin"' -B5 | grep 'otherstring'

Obviously adjust -B5 to how many lines you need to retain the string you are looking for.

回答4:

If your inputs really that simple all you need is:

$ awk '/<packet>/{buf=""} {buf=buf $0 RS} /rumpelstiltskin/{printf "%s",buf}' file
<packet>
 <proto>
 <field show="rumpelstiltskin">

or if you prefer:

$ awk '/<packet>/{buf="";f=0} {buf=buf $0 RS} /rumpelstiltskin/{f=1} f&&/<\/packet>/{printf "%s",buf}' file
<packet>
 <proto>
 <field show="rumpelstiltskin">
 </proto>
</packet>

and if you want to stop reading the input file after the first print then just add ;exit after it so printf "%s",buf becomes printf "%s",buf; exit.

回答5:

This might work for you (GNU sed):

sed '/<packet>/h;//!H;/rumpelstiltskin/!d;x;q' file

This stores the required strings in the hold space, prints them out and quits.

However to be sure the first and second strings exist and are adjacent to one another:

sed '/<packet>/h;//!H;/rumpelstiltskin/!d;x;/<packet>.*rumpelstiltskin/!d;q' file

来源：https://stackoverflow.com/questions/40639004/bash-using-awk-or-sed-to-search-backwards-from-occurance-to-a-specific-string

标签

bash

awk

sed