scrape data from website that turned next page when scrolled to bottom using Python and BeautifulSoup

寵の児 提交于 2020-01-15 03:49:26

问题


If I need to scrape data from website that load next page automatically when one scrolled to be bottom of the page (i.e. endless extending the page) using Python and Beautiful, how can I do that? Is there a general approach or it needs to be tailored for each website?

Example of website: http://statigr.am/tag/cat/#/list


回答1:


If there is a dynamic behavior like loading additional content via ajax call (as it is here on statigr.am) - you should either use a real browser with the help of selenium or you should tailor your web scraper script for a specific web-site and simulate ajax calls by yourself.

For tailoring your staticgr.am web-scraper you need to use browser development tools to see what requests are made after the page load. You may notice this XHR request was made first:

http://statigr.am/controller_nl.php?action=nlGetMethod&method=mediasTag&value=cat&max_id=1371516699343

It returns json with all the data you need. There is also next_max_tag_id key in pagination dictionary - it's used for the next ajax request to controller_nl.php. So, I'd simulate them via urllib2 or requests and parse the json via json module. Looks like no need for parsing html with beautifulsoup.

Hope that helps.



来源:https://stackoverflow.com/questions/17144536/scrape-data-from-website-that-turned-next-page-when-scrolled-to-bottom-using-pyt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!