how does scrapy-splash handle infinite scrolling?

前端 未结 2 1233
栀梦
栀梦 2020-12-09 13:25

I want to reverse engineering the contents generated by scrolling down in the webpage. The problem is in the url https://www.crowdfunder.com/user/following_page/80159?

2条回答
  •  猫巷女王i
    2020-12-09 13:56

    To scroll a page you can write a custom rendering script (see http://splash.readthedocs.io/en/stable/scripting-tutorial.html), something like this:

    function main(splash)
        local num_scrolls = 10
        local scroll_delay = 1.0
    
        local scroll_to = splash:jsfunc("window.scrollTo")
        local get_body_height = splash:jsfunc(
            "function() {return document.body.scrollHeight;}"
        )
        assert(splash:go(splash.args.url))
        splash:wait(splash.args.wait)
    
        for _ = 1, num_scrolls do
            scroll_to(0, get_body_height())
            splash:wait(scroll_delay)
        end        
        return splash:html()
    end
    

    To render this script use 'execute' endpoint instead of render.html endpoint:

    script = """ """
    scrapy_splash.SplashRequest(url, self.parse,
                                endpoint='execute', 
                                args={'wait':2, 'lua_source': script}, ...)
    

提交回复
热议问题