How do I crawl an infinite-scrolling page?

前端 未结 3 778
一生所求
一生所求 2021-01-03 01:06

I\'m trying to build something that crawls the content from a page with infinite scroll. However, I can\'t get the stuff from below the first \'break\'. How do I do this?

3条回答
  •  清歌不尽
    2021-01-03 01:34

    This answer should be relevant for a large percentage of infinite scrollers, obviously your milage might vary.

    Most infinite scrollers work by using an offset position and just grab the next chunk of items from the offset. It's exactly the same as how paging might work by stepping through

    < Previous 1 2 3 4 5 Next > except that the offsets are stored and used to make a fresh request.

    With this in mind, if you open up the developer toolbar in Chrome or Firefox and check out the network tab, you will most likely see requests coming in as you scroll down.

    Look at the parameters on the request, and you will most likely see something like

    GET /api/v2/books?offset=100=count=10
    GET /api/v2/books?offset=110=count=10
    GET /api/v2/books?offset=120=count=10
    

    Knowing this, you can very easily ignore actually scraping of the target HTML, and just use their internal target URI to make your requests.

提交回复
热议问题