问题
I've an xml file and am searching for a string in this file. Once (and if) the string is found I need to be able to search back to the position of another string and output the data.
ie:
<xml>
<packet>
<proto>
<field show="bob">
</proto>
</packet>
<packet>
<proto>
<field show="rumpelstiltskin">
</proto>
</packet>
<packet>
<proto>
<field show="peter">
</proto>
</packet>
My input would be known:
show="rumpelstiltskin"
and
<packet>
I need to get the following result (which is basically the second block);
<packet>
<proto>
<field show="rumpelstiltskin">
</proto>
</packet>
or
<packet>
<proto>
<field show="rumpelstiltskin">
There are thousands of (wireshark pdml conversion) and the show="rumpelstilstkin" can occur anywhere in the file and the section can be of any arbitrary size.
I've done this before and am pretty sure it's possible in an awk or sed oneliner.. any help appreciated!
回答1:
So ... you COULD hack something together that would do basic parsing of your file as a text file...
awk -v txt="rumpel" '$0=="<packet>"{s=$0; found=0; next} $0~txt{found=1} {s=s RS $0} $0=="</packet>" && found {print s}' inp.xml
Broken out into pieces for easier explanation, this does the following:
-v txt="rumpel"
- sets a variable for use within the script. Note that this will be evaluated as a regex in this example, but you could useindex()
if you prefer to search for it as a string.$0=="<packet>"{s=$0; found=0; next}
- If we find the start of a packet, reset our storage variable (s
) and flag (found
).$0~txt{found=1}
- If we find the text we're looking for, set a flag.{s=s RS $0}
- Append the current line to a variable, and$0=="</packet>" && found {print s}
- if we're at the end of our text and the string was found, print.
A better approach would likely be to interpret the XML using something that understands XML natively, but that isn't possible with just sed and awk.
回答2:
You need to treat your XML as XML and use an appropriate tool. For example, modifying your XML slightly to make it valid:
<xml>
<packet>
<proto>
<field show="bob"/>
</proto>
</packet>
<packet>
<proto>
<field show="rumpelstiltskin"/>
</proto>
</packet>
<packet>
<proto>
<field show="peter"/>
</proto>
</packet>
</xml>
You could use xmllint
like this:
xmllint --xpath '//packet[proto/field/@show="rumpelstiltskin"]' file.xml
This matches and prints the contents of all <packet>
elements that contain a <field show="rumpelstiltskin">
within a <proto>
element.
If you don't want to specify the complete hierarchy, you can use something like this instead:
xmllint --xpath '//packet[descendant::field[@show="rumpelstiltskin"]]' file.xml
回答3:
You could do this with grep
cat file | grep 'show="rumpelstiltskin"' -B5 | grep 'otherstring'
Obviously adjust -B5
to how many lines you need to retain the string you are looking for.
回答4:
If your inputs really that simple all you need is:
$ awk '/<packet>/{buf=""} {buf=buf $0 RS} /rumpelstiltskin/{printf "%s",buf}' file
<packet>
<proto>
<field show="rumpelstiltskin">
or if you prefer:
$ awk '/<packet>/{buf="";f=0} {buf=buf $0 RS} /rumpelstiltskin/{f=1} f&&/<\/packet>/{printf "%s",buf}' file
<packet>
<proto>
<field show="rumpelstiltskin">
</proto>
</packet>
and if you want to stop reading the input file after the first print then just add ;exit
after it so printf "%s",buf
becomes printf "%s",buf; exit
.
回答5:
This might work for you (GNU sed):
sed '/<packet>/h;//!H;/rumpelstiltskin/!d;x;q' file
This stores the required strings in the hold space, prints them out and quits.
However to be sure the first and second strings exist and are adjacent to one another:
sed '/<packet>/h;//!H;/rumpelstiltskin/!d;x;/<packet>.*rumpelstiltskin/!d;q' file
来源:https://stackoverflow.com/questions/40639004/bash-using-awk-or-sed-to-search-backwards-from-occurance-to-a-specific-string