Parsing HTML using Python

前端 未结 7 841
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-22 00:35

I\'m looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects.

If I have a document of the form:

7条回答
  •  南旧
    南旧 (楼主)
    2020-11-22 01:15

    Here you can read more about different HTML parsers in Python and their performance. Even though the article is a bit dated it still gives you a good overview.

    Python HTML parser performance

    I'd recommend BeautifulSoup even though it isn't built in. Just because it's so easy to work with for those kinds of tasks. Eg:

    import urllib2
    from BeautifulSoup import BeautifulSoup
    
    page = urllib2.urlopen('http://www.google.com/')
    soup = BeautifulSoup(page)
    
    x = soup.body.find('div', attrs={'class' : 'container'}).text
    

提交回复
热议问题