Web Scraping Python (BeautifulSoup,Requests)

大兔子大兔子 提交于 2020-05-28 09:55:14

问题


I am learning web scraping using python but I can't get the desired result. Below is my code and the output

code

import bs4,requests
url = "https://twitter.com/24x7chess"
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text,"html.parser")
soup.find_all("span",{"class":"account-group-inner"})
[]

Here is what I was trying to scrape

https://i.stack.imgur.com/tHo5S.png

I keep on getting an empty array. Please Help.


回答1:


Try this. It will give you the items you probably look for. Selenium with BeautifulSoup is easy to handle. I've written it that way. Here it is.

from bs4 import BeautifulSoup 
from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://twitter.com/24x7chess")
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
for title in soup.select("#page-container"): 
    name = title.select(".ProfileHeaderCard-nameLink")[0].text.strip()
    location = title.select(".ProfileHeaderCard-locationText")[0].text.strip()
    tweets = title.select(".ProfileNav-value")[0].text.strip()
    following = title.select(".ProfileNav-value")[1].text.strip()
    followers = title.select(".ProfileNav-value")[2].text.strip()
    likes = title.select(".ProfileNav-value")[3].text.strip()
    print(name,location,tweets,following,followers,likes)

Output:

akul chhillar New Delhi, India 214 44 17 5



回答2:


Sites like Twitter load the content dynamically, which sometimes depends upon the browser you are using etc. And due to dynamic loading there could be some elements in the webpage which are lazily loaded, which means that the DOM is inflated dynamically, depending upon the user actions, The tag you are inspecting in your browser Inspect element, is inspected the fully dynamically inflated HTML, But the response you are getting using requests, is inflated HTML, or a simple DOM waiting to load the elements dynamically on the user actions which in your case while fetching from requests module is None.

I would suggest you to use selenium webdriver for scraping dynamic javascript web pages.




回答3:


You could have done the whole thing with requests rather than selenium

import requests
from bs4 import BeautifulSoup as bs
import re

r = requests.get('https://twitter.com/24x7chess')
soup = bs(r.content, 'lxml')
bio = re.sub(r'\n+',' ', soup.select_one('[name=description]')['content'])
stats_headers = ['Tweets', 'Following', 'Followers', 'Likes']
stats = [item['data-count'] for item in soup.select('[data-count]')]
data = dict(zip(stats_headers, stats))

print(bio, data)



来源:https://stackoverflow.com/questions/46860838/web-scraping-python-beautifulsoup-requests

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!