scrape websites with infinite scrolling

前端 未结 3 1954
伪装坚强ぢ
伪装坚强ぢ 2020-11-29 01:29

I have written many scrapers but I am not really sure how to handle infinite scrollers. These days most website etc, Facebook, Pinterest has infinite scrollers.

3条回答
  •  攒了一身酷
    2020-11-29 01:49

    Most sites that have infinite scrolling do (as Lattyware notes) have a proper API as well, and you will likely be better served by using this rather than scraping.

    But if you must scrape...

    Such sites are using JavaScript to request additional content from the site when you reach the bottom of the page. All you need to do is figure out the URL of that additional content and you can retrieve it. Figuring out the required URL can be done by inspecting the script, by using the Firefox Web console, or by using a debug proxy.

    For example, open the Firefox Web Console, turn off all the filter buttons except Net, and load the site you wish to scrape. You'll see all the files as they are loaded. Scroll the page while watching the Web Console and you'll see the URLs being used for the additional requests. Then you can request that URL yourself and see what format the data is in (probably JSON) and get it into your Python script.

提交回复
热议问题