How to detect with python if the string contains html code?

后端 未结 4 583
没有蜡笔的小新
没有蜡笔的小新 2020-12-29 21:44

How to detect either the string contains an html (can be html4, html5, just partials of html within text)? I do not need a version of HTML, but rather if the string is just

4条回答
  •  Happy的楠姐
    2020-12-29 22:43

    You can use an HTML parser, like BeautifulSoup. Note that it really tries it best to parse an HTML, even broken HTML, it can be very and not very lenient depending on the underlying parser:

    >>> from bs4 import BeautifulSoup
    >>> html = """
    ... I'm title
    ... """
    >>> non_html = "This is not an html"
    >>> bool(BeautifulSoup(html, "html.parser").find())
    True
    >>> bool(BeautifulSoup(non_html, "html.parser").find())
    False
    

    This basically tries to find any html element inside the string. If found - the result is True.

    Another example with an HTML fragment:

    >>> html = "Hello, world"
    >>> bool(BeautifulSoup(html, "html.parser").find())
    True
    

    Alternatively, you can use lxml.html:

    >>> import lxml.html
    >>> html = 'Hello, world'
    >>> non_html = "<"
    >>> lxml.html.fromstring(html).find('.//*') is not None
    True
    >>> lxml.html.fromstring(non_html).find('.//*') is not None
    False
    

提交回复
热议问题