问题
I want to find a sentence between the ...class="question-hyperlink">
tags.
With this code:
import urllib2
import re
response = urllib2.urlopen('https://stackoverflow.com/questions/tagged/python')
html = response.read(20000)
a = re.search('question-hyperlink', html)
print html[a.end()+3:a.end()+100]
I get:
DF5 for Python: high level vs low level interfaces. h5py</a></h3> <div class="excerpt">
How can I stop at the next <
?
And how do I find the next sentence?
I want to do it with regex.
EDIT To the downvoters: I want to do it like he does: RegEx match open tags except XHTML self-contained tags
回答1:
If you must do it with regular expressions, try something like this:
a = re.finditer('<a.+?question-hyperlink">(.+?)</a>', html)
for m in a:
print m.group(1)
Just for the reference, this code does the same, but in a far more robust way:
doc = BeautifulSoup(html)
for a in doc.findAll('a', 'question-hyperlink'):
print a.text
来源:https://stackoverflow.com/questions/8096798/python-find-a-sentence-between-some-website-tags-using-regex