How to parse malformed HTML in python

后端 未结 2 1104
别那么骄傲
别那么骄傲 2021-01-04 03:51

I need to browse the DOM tree of a parsed HTML document.

I\'m using uTidyLib before parsing the string with lxml

a = tidy.parseString(html_code, options) dom

2条回答
  •  独厮守ぢ
    2021-01-04 04:10

    Beautiful Soup does a good job with invalid/broken HTML

    >>> from BeautifulSoup import BeautifulSoup
    >>> soup = BeautifulSoup(">> print soup.prettify()
    
     
      
    hi
    hi

提交回复
热议问题