问题
After searching and reading extensively, I managed to get half of the work done.
Here is the string:
<td class='bold vmiddle'> Owner CIDR: </td><td><span class='jtruncate-text'><a href="http://3.abcdef.com/ip-3/encoded/czovL215aXAubXMvdmlldy9pcF9hZGRyZXNzZXMvNDIuMjI0LjAuMA%3D%3D">42.224.0.0</a>/12</span></td>
I need to extract the 42.224.0.0 and /12 to make a 42.224.0.0/12.
Now I managed to get 42.224.0.0 by using:
sed -n 's/^.*<a.href="[^"]*">\([^<]*\).*/\1/p'
but I'm at a loss how to extract /12.
Can anyone help?
回答1:
You were pretty close:
sed -n 's/^.*<a.href="[^"]*">\([^<]*\)<\/a>\([^<]*\).*/\1\2/p' file
All that was needed was a 2nd capture group: <\/a> after the 1st one matches the closing tag for <a>, and the 2nd capture group, \([^<]*\), then captures everything up to but not including the closing </span> tag.\1\2 in the replacement string simply concatenates what the two capture groups matched, yielding 42.224.0.0/12 with the sample input.
回答2:
You can try below awk solution -
vipin@kali:~$ awk -F'>|<' '{print $(NF-6),$(NF-4)}' OFS="" kk.txt
42.224.0.0/12
Need to use multiple multiple(>,<) field seperators.
来源:https://stackoverflow.com/questions/40393877/extract-data-between-two-tags