问题
After searching and reading extensively, I managed to get half of the work done.
Here is the string:
<td class='bold vmiddle'> Owner CIDR: </td><td><span class='jtruncate-text'><a href="http://3.abcdef.com/ip-3/encoded/czovL215aXAubXMvdmlldy9pcF9hZGRyZXNzZXMvNDIuMjI0LjAuMA%3D%3D">42.224.0.0</a>/12</span></td>
I need to extract the 42.224.0.0
and /12
to make a 42.224.0.0/12
.
Now I managed to get 42.224.0.0
by using:
sed -n 's/^.*<a.href="[^"]*">\([^<]*\).*/\1/p'
but I'm at a loss how to extract /12
.
Can anyone help?
回答1:
You were pretty close:
sed -n 's/^.*<a.href="[^"]*">\([^<]*\)<\/a>\([^<]*\).*/\1\2/p' file
All that was needed was a 2nd capture group: <\/a>
after the 1st one matches the closing tag for <a>
, and the 2nd capture group, \([^<]*\)
, then captures everything up to but not including the closing </span>
tag.\1\2
in the replacement string simply concatenates what the two capture groups matched, yielding 42.224.0.0/12
with the sample input.
回答2:
You can try below awk solution -
vipin@kali:~$ awk -F'>|<' '{print $(NF-6),$(NF-4)}' OFS="" kk.txt
42.224.0.0/12
Need to use multiple multiple(>,<)
field seperators.
来源:https://stackoverflow.com/questions/40393877/extract-data-between-two-tags