How to parse malformed HTML in python

后端未结

关注

 2  1115

I need to browse the DOM tree of a parsed HTML document.

I\'m using uTidyLib before parsing the string with lxml

a = tidy.parseString(html_code, options) dom

2条回答

2021-01-04 04:10

Beautiful Soup does a good job with invalid/broken HTML

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(">> print soup.prettify()

 
  hi

   
    
     hi

0 讨论(0)