Python: Find a Sentence between some website-tags using regex

守給你的承諾、 提交于 2019-12-13 09:38:10

问题


I want to find a sentence between the ...class="question-hyperlink"> tags. With this code:

import urllib2
import re

response = urllib2.urlopen('https://stackoverflow.com/questions/tagged/python')
html = response.read(20000)

a = re.search('question-hyperlink', html)
print html[a.end()+3:a.end()+100]

I get:

DF5 for Python: high level vs low level interfaces. h5py</a></h3>        <div class="excerpt">

How can I stop at the next < ? And how do I find the next sentence? I want to do it with regex.

EDIT To the downvoters: I want to do it like he does: RegEx match open tags except XHTML self-contained tags


回答1:


If you must do it with regular expressions, try something like this:

a = re.finditer('<a.+?question-hyperlink">(.+?)</a>', html)
for m in a: 
    print m.group(1)

Just for the reference, this code does the same, but in a far more robust way:

doc = BeautifulSoup(html)
for a in doc.findAll('a', 'question-hyperlink'):
    print a.text


来源:https://stackoverflow.com/questions/8096798/python-find-a-sentence-between-some-website-tags-using-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!