beautifulsoup: Parse Span Title

血红的双手。 提交于 2019-12-11 19:43:26

问题


I am trying to parse a html page, I have successfully got to the sub area of the tree of the html dom but I am stuck in a place where there are span tags.

example: I initially parse the page as follows:

        user_url = base_url + str(user_id) + "/" + display_name
        user_page = urllib2.urlopen(user_url)
        souping_page = bs(user_page)
        badges = souping_page.body.find('div', attrs={'class': 'badges'})

badges will give me following:

<span><span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span><span title="23 silver badges"><span class="badge2"></span><span class="badgecount">23</span></span><span title="43 bronze badges"><span class="badge3"></span><span class="badgecount">43</span></span></span>

But I am trying to extract <span title="3 gold badges"> and all the other span title attributes by traversing the dom structure. How can I do that in beautifulsoup.


回答1:


You can simply do this:

>>> badges.span.span
<span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span>


来源:https://stackoverflow.com/questions/22122307/beautifulsoup-parse-span-title

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!