问题
On the page bellow --> link, I'm trying to use BeautifulSoup in order to extract the <a> texts at the very bottom, i.e., 'Private Life' and 'Lost Boy'.
But I'm having a hard time scraping <iframe> content.
I've learned that it requires a different request from the browser.
So I've tried:
iframexx = soup.find_all('iframe')
for iframe in iframexx:
try:
response = urllib2.urlopen(iframe)
results = BeautifulSoup(response)
print results
but that returns None.
how do I parse the html bellow so I can fetch each a['href'].get_text()?
回答1:
Browsers will load the iframe content in a separate request, so you'll need to fetch the url that is present in the iframe src. You can use selenium if you want, or scrape the data itself directly.
Here is an example:
import requests
import re
url = 'https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/310079005&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false'
response = requests.get(url)
Artist = re.search(b'(?<=artist":")(.*?)(?=")', response.content).group(0).decode("utf-8")
Song = re.search(b'(?<=title":")(.*?)(?=")', response.content).group(0).decode("utf-8")
print ("%s - %s" % (Artist, Song))
Private Life - Lost Boy
来源:https://stackoverflow.com/questions/42589907/extract-iframe-content-using-beautifulsoup