Beautiful Soup returns empty list

非 Y 不嫁゛ 提交于 2019-12-24 11:02:41

问题


I am new to webscrapping. So I have been given a task to extract data from : Here

I am choosing dataset of "comments". Below is my code for scrapping.

import requests
from bs4 import BeautifulSoup
url = 'https://www.kaggle.com/hacker-news/hacker-news'
headers = {'User-Agent' : 'Mozilla/5.0'}
response = requests.get(url, headers = headers)
response.status_code
response.content
soup = BeautifulSoup(response.content, 'html.parser')
soup.find_all('tbody', class_ = 'TableBody-kSbjpE jGqIxa')

When I try to execute the last command it returns : [].

So, I am stuck here. I know we can get the data from kernel, but just for practice purpose where am I going wrong? Am I choosing wrong class? I want to scrape the data and probably save it to a CSV file or to a No-SQL Database, preferred Cassandra.


回答1:


you are getting this [] because data you want to scrape is coming from API which loads after you web page load so page you are accessing does not contain that class

you can open you browser console and check in network as given in screenshot there you find data you want to scrape so you have to make request to that URL to get data

you can retrive data in this URL in preview tab you can see all data.

also if you have good knowledge of python you can also use this to scrape data

https://doc.scrapy.org/en/latest/intro/overview.html




回答2:


Even though you were able to see the 'tbody', class_ = 'TableBody-kSbjpE jGqIxa' in the element inspector, the request that you make does not contain this class. See for yourself print(soup.prettify()). This is most likely because you're not requesting the correct url.

This may be not something you're aware of, but as a fyi: You don't actually need to scrape using BeautifulSoup, you can get a list of all the available datasets from the API. Once you have it installed and configured, you can get the dataset: kaggle datasets download -d . Here's more info if you wish to proceed with the API instead: https://github.com/Kaggle/kaggle-api



来源:https://stackoverflow.com/questions/51839937/beautiful-soup-returns-empty-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!