How can I extract the text between <a></a>? [closed]

会有一股神秘感。 提交于 2019-12-13 09:39:00

问题


I'm using Beautiful Soup, but couldn't figure out how to do it.

</td>
        <td class="playbuttonCell">
        <a class="playbutton preview-track" href="/music/example" data-analytics-redirect="false"  ><img class="transparent_png play_icon" width="13" height="13" alt="Play" src="http://cdn.last.fm/flatness/preview/play_indicator.png" style="" /></a>    </td>
                                                        <td class="subjectCell" title="example, played 3 times">
            <div>
                                        <a href="/music/example"   >here lies the text i need</a>

this isn't doing the job

print soup('a')

for link in soup('a'):
    print html   

prints everything, what else can i try?


回答1:


import urllib
from bs4 import BeautifulSoup

html = urllib.urlopen('http://www.last.fm/user/Jehl/charts?rangetype=overall&subtype=artists').read()
soup = BeautifulSoup(html)
print soup('a')
# prints [<a href="/" id="lastfmLogo">Last.fm</a>, <a class="nav-link" href="/music">Music</a>....

For getting the text of each one of them.

for link in soup('a'):
    print link.get_text()


来源:https://stackoverflow.com/questions/13233548/how-can-i-extract-the-text-between-a-a

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!