How do I crawl an infinite-scrolling page?

走远了吗. 提交于 2019-12-18 17:04:32

问题


I'm trying to build something that crawls the content from a page with infinite scroll. However, I can't get the stuff from below the first 'break'. How do I do this?


回答1:


Infinite scrolling is almost always done in JavaScript by using AJAX, or related technology. As such, it is not enough for your web crawler to get the HTML and parse it; it must download and execute the javascript, or at least scan it for the AJAX calls.

Doing a full javascript execution is probably best (ie, will be most guaranteed to work), but is probably the hardest to do.

Scanning the javascript for AJAX requests and/or looking for functions that execute AJAX calls and then do DOM manipulation will probably be easiest (relative to full JS execution)




回答2:


This answer should be relevant for a large percentage of infinite scrollers, obviously your milage might vary.

Most infinite scrollers work by using an offset position and just grab the next chunk of items from the offset. It's exactly the same as how paging might work by stepping through

< Previous 1 2 3 4 5 Next > except that the offsets are stored and used to make a fresh request.

With this in mind, if you open up the developer toolbar in Chrome or Firefox and check out the network tab, you will most likely see requests coming in as you scroll down.

Look at the parameters on the request, and you will most likely see something like

GET /api/v2/books?offset=100=count=10
GET /api/v2/books?offset=110=count=10
GET /api/v2/books?offset=120=count=10

Knowing this, you can very easily ignore actually scraping of the target HTML, and just use their internal target URI to make your requests.




回答3:


An ajax request is no different from any other request. You simply make the request, parse the result, and there you have your data.

It can take some experience if you haven't done it before but it sounds like a good learning experience.



来源:https://stackoverflow.com/questions/12996392/how-do-i-crawl-an-infinite-scrolling-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!