问题
I'm using Beautiful Soup, but couldn't figure out how to do it.
</td>
<td class="playbuttonCell">
<a class="playbutton preview-track" href="/music/example" data-analytics-redirect="false" ><img class="transparent_png play_icon" width="13" height="13" alt="Play" src="http://cdn.last.fm/flatness/preview/play_indicator.png" style="" /></a> </td>
<td class="subjectCell" title="example, played 3 times">
<div>
<a href="/music/example" >here lies the text i need</a>
this isn't doing the job
print soup('a')
for link in soup('a'):
print html
prints everything, what else can i try?
回答1:
import urllib
from bs4 import BeautifulSoup
html = urllib.urlopen('http://www.last.fm/user/Jehl/charts?rangetype=overall&subtype=artists').read()
soup = BeautifulSoup(html)
print soup('a')
# prints [<a href="/" id="lastfmLogo">Last.fm</a>, <a class="nav-link" href="/music">Music</a>....
For getting the text of each one of them.
for link in soup('a'):
print link.get_text()
来源:https://stackoverflow.com/questions/13233548/how-can-i-extract-the-text-between-a-a