Scrapy Shell and Scrapy Splash

前端 未结 3 879
刺人心
刺人心 2020-12-07 14:53

We\'ve been using scrapy-splash middleware to pass the scraped HTML source through the Splash javascript engine running inside a docker container.

If we

3条回答
  •  没有蜡笔的小新
    2020-12-07 15:51

    just wrap the url you want to shell to in splash http api.

    So you would want something like:

    scrapy shell 'http://localhost:8050/render.html?url=http://domain.com/page-with-javascript.html&timeout=10&wait=0.5'
    

    where localhost:port is where your splash service is running
    url is url you want to crawl and dont forget to urlquote it!
    render.html is one of the possible http api endpoints, returns redered html page in this case
    timeout time in seconds for timeout
    wait time in seconds to wait for javascript to execute before reading/saving the html.

提交回复
热议问题