Extract IP address from HTML document

馋奶兔 提交于 2020-01-15 05:51:19

问题


How can I print the IP address (86.23.215.130) of the following line? The entire file (not shown) is the stdout from a wget (hence HTML). Sounds easy, but I didn't manage.

...
<tr><td align=center colspan=3 bgcolor="D0D0D0"><font face="Arial, Monospace" size=+3>86.23.215.130</font></td></tr>
...

Thanks


回答1:


Why sed? I believe grep is much better:

grep -iohP '(?<=\x3e)([0-9]+\.){3}[0-9]+(?=\x3c)' file

where \x3e means > and \x3c means < (ascii hex code)

Although sed can do this, but it's not recommended:

sed -rn 's/.*\x3e(([0-9]+\.){3}[0-9]+)\x3c.*/\1/p' file

Thanks to Mr. Sternad, I improved this a little bit.




回答2:


If you want to extract the IP address only, you should use the following command:

sed -E -n 's/.*>([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)<.*/\1/p' file.txt

Here is what it does:

  • -E switches sed into extended regex mode (-r in GNU Sed)
  • -n suppresses the output of matched lines
  • 's/something/something2/p' substitutes something with something2 and prints the resulting match
  • ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+) captures a group of four consecutive digits, separated by dots
  • \1 is a reference to the captured group above

Note that this regex does not necessarily find correct IP addresses, but any sequence of digits, separated by dots.

If you want more flexibility (and accuracy), you could use the Perl Commons Regex module. It validates IP addresses.

perl -MRegexp::Common -lne 'print $1 if /($RE{net}{IPv4})/' file.txt

Note that you have to correctly anchor your expression, otherwise an invalid IP, like 486.23.215.130 will be reduced to a valid address of 86.23.215.130.




回答3:


Ip addresses are four groups of 0-3 digits separated by 3 period points.

sed -e '/[0-9]\.[0-9]\.[0-9]\.[0-9]/p' infile.txt



回答4:


What about this here? Any remarks?

grep "size=+3" | awk -F'[<>]' '{print $7}'

I know ... it assumes that the IP is always at the same place in the line containing size+3. Your suggestions are all far more generally formulated, hence better applicable to any parse input text.



来源:https://stackoverflow.com/questions/36101699/extract-ip-address-from-html-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!