问题
I'm using the BeautifulSoup module for scraping the total number of followers and total number of tweets from a Twitter account. However, when I tried inspecting the elements of the respective fields on the web page, I found that both the fields are enclosed inside same set of html attributes:
Followers
<a class="ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav u-textUserColor" data-nav="followers" href="/IAmJericho/followers" data-original-title="2,469,681 Followers">
<span class="ProfileNav-label">Followers</span>
<span class="ProfileNav-value" data-is-compact="true">2.47M</span>
</a>
Tweet count
<a class="ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav" data-nav="tweets" tabindex="0" data-original-title="21,769 Tweets">
<span class="ProfileNav-label">Tweets</span>
<span class="ProfileNav-value" data-is-compact="true">21.8K</span>
</a>
The mining script that I wrote:
import requests
import urllib2
from bs4 import BeautifulSoup
link = "https://twitter.com/iamjericho"
r = urllib2.urlopen(link)
src = r.read()
res = BeautifulSoup(src)
followers = ''
for e in res.findAll('span', {'data-is-compact':'true'}):
followers = e.text
print followers
However, since the values of both, the total tweet count and total number of followers are enclosed inside same set of HTML attributes, ie inside a span tag with class = "ProfileNav-value" and data-is-compact = "true", I only get the results of the total number of followers returned by running the above script.
How could I possibly extract two sets of information enclosed inside similar HTML attributes from BeautifulSoup?
回答1:
In this case, one way to achieve it, is to check that data-is-compact="true" only appears twice for each piece of data you want to extract, and also you know that tweets is first and followers second, so you can have a list with those titles in same order and use a zip to join them in a tuple to print both at same time, like:
import urllib2
from bs4 import BeautifulSoup
profile = ['Tweets', 'Followers']
link = "https://twitter.com/iamjericho"
r = urllib2.urlopen(link)
src = r.read()
res = BeautifulSoup(src)
followers = ''
for p, d in zip(profile, res.find_all('span', { 'data-is-compact': "true"})):
print p, d.text
It yields:
Tweets 21,8K
Followers 2,47M
来源:https://stackoverflow.com/questions/31225803/beautifulsoup-scraping-different-data-sets-having-same-set-of-attributes-in-the