Extract data between two tags

佐手、 提交于 2020-01-15 10:34:21

问题


After searching and reading extensively, I managed to get half of the work done.

Here is the string:

<td class='bold vmiddle'> Owner CIDR: </td><td><span class='jtruncate-text'><a href="http://3.abcdef.com/ip-3/encoded/czovL215aXAubXMvdmlldy9pcF9hZGRyZXNzZXMvNDIuMjI0LjAuMA%3D%3D">42.224.0.0</a>/12</span></td>

I need to extract the 42.224.0.0 and /12 to make a 42.224.0.0/12.

Now I managed to get 42.224.0.0 by using:

sed -n 's/^.*<a.href="[^"]*">\([^<]*\).*/\1/p'

but I'm at a loss how to extract /12.

Can anyone help?


回答1:


You were pretty close:

sed -n 's/^.*<a.href="[^"]*">\([^<]*\)<\/a>\([^<]*\).*/\1\2/p' file

All that was needed was a 2nd capture group: <\/a> after the 1st one matches the closing tag for <a>, and the 2nd capture group, \([^<]*\), then captures everything up to but not including the closing </span> tag.
\1\2 in the replacement string simply concatenates what the two capture groups matched, yielding 42.224.0.0/12 with the sample input.




回答2:


You can try below awk solution -

vipin@kali:~$ awk -F'>|<' '{print $(NF-6),$(NF-4)}' OFS="" kk.txt
42.224.0.0/12

Need to use multiple multiple(>,<) field seperators.



来源:https://stackoverflow.com/questions/40393877/extract-data-between-two-tags

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!