using proxy with scrapy-splash

霸气de小男生 提交于 2019-12-09 06:52:39

问题


I'm trying to use proxy (proxymesh) alongside scrapy-splash. I have following (relevant) code

PROXY = """splash:on_request(function(request)
    request:set_proxy{
        host = http://us-ny.proxymesh.com,
        port = 31280,
        username = username,
        password = secretpass,
    }
    return splash:html()
end)"""

and in start_requests

def start_requests(self):
    for url in self.start_urls:
        print url
        yield SplashRequest(url, self.parse,
            endpoint='execute',
            args={'wait': 5,
                  'lua_source': PROXY,
                  'js_source': 'document.body'},

But it does not seem to work. self.parse is not called at all. If I change endpoint to 'render.html' I hit the self.parse method, but when I inspect headers (response.headers) I can see that it is not going trough proxy. I confirmed that when I set http://checkip.dyndns.org/ as starting url and saw, upon parsing response, my old ip address.

What am I doing wrong?


回答1:


You should add 'proxy' argument to SplashRequest object.

def start_requests(self):
    for url in self.start_urls:
        print url
        yield SplashRequest(url, self.parse,
            endpoint='execute',
            args={'wait': 5,
                  'lua_source': PROXY,
                  'js_source': 'document.body',
                  'proxy': 'http://proxy_ip:proxy_port'}


来源:https://stackoverflow.com/questions/43646438/using-proxy-with-scrapy-splash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!