How to parse a html file and get the text which is in between the tags by using Python? [duplicate]

问题

Possible Duplicate:
Parsing HTML in Python

I have searched more over on the internet for get the text which is in between the tags by using Python. Can you guys please explain?

回答1:

Here is an example of using BeautifulSoup to parse HTML:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("""<html><body>
                        <div id="a" class="c1">
                            We want to get this
                        </div>
                        <div id="b">
                            We don't want to get this
                        </div></body></html>""")
print soup('div', id='a').text

This outputs

We want to get this

回答2:

The htmlparser provided in the link in the comments above is probably the more robust way to go. But if you have a simple bit of content that is between particular tags you can use regular expressions

import re
html = '<html><body><div id='blah-content'>Blah</div><div id='content-i-want'>good stuff</div></body></html>'
m = re.match(r'.*<div.*id=\'content-i-want\'.*>(.*?)</div>', html)
if m:
    print m.group(1) # Should print 'good stuff'

来源：https://stackoverflow.com/questions/7080506/how-to-parse-a-html-file-and-get-the-text-which-is-in-between-the-tags-by-using

标签

python

html-parsing

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!