问题
Possible Duplicate:
Parsing HTML in Python
I have searched more over on the internet for get the text which is in between the tags by using Python. Can you guys please explain?
回答1:
Here is an example of using BeautifulSoup to parse HTML:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("""<html><body>
<div id="a" class="c1">
We want to get this
</div>
<div id="b">
We don't want to get this
</div></body></html>""")
print soup('div', id='a').text
This outputs
We want to get this
回答2:
The htmlparser provided in the link in the comments above is probably the more robust way to go. But if you have a simple bit of content that is between particular tags you can use regular expressions
import re
html = '<html><body><div id='blah-content'>Blah</div><div id='content-i-want'>good stuff</div></body></html>'
m = re.match(r'.*<div.*id=\'content-i-want\'.*>(.*?)</div>', html)
if m:
print m.group(1) # Should print 'good stuff'
来源:https://stackoverflow.com/questions/7080506/how-to-parse-a-html-file-and-get-the-text-which-is-in-between-the-tags-by-using