Using Tor proxy with scrapy

拜拜、爱过 提交于 2019-12-03 07:54:51

问题


I need help setting up Tor in Ubuntu and to use it within scrapy framework.

I did some research and found out this guide:

class RetryChangeProxyMiddleware(RetryMiddleware):

    def _retry(self, request, reason, spider):
        log.msg('Changing proxy')
        tn = telnetlib.Telnet('127.0.0.1', 9051)
        tn.read_until("Escape character is '^]'.", 2)
        tn.write('AUTHENTICATE "267765"\r\n')
        tn.read_until("250 OK", 2)
        tn.write("signal NEWNYM\r\n")
        tn.read_until("250 OK", 2)
        tn.write("quit\r\n")
        tn.close()
        time.sleep(3)
        log.msg('Proxy changed')
        return RetryMiddleware._retry(self, request, reason, spider)

then use it in settings.py:

DOWNLOADER_MIDDLEWARE = {
                         'spider.middlewares.RetryChangeProxyMiddleware': 600,
                         }

and then you just want to send requests through local tor proxy (polipo) which could be done with:

tsocks scrapy crawl spirder 

does anyone can confirm, that this method works and you get different IPs?


回答1:


I was using this snippet: http://snipplr.com/view/66992/use-a-random-user-agent-for-each-request/

Update: broken link fixed



来源:https://stackoverflow.com/questions/11603423/using-tor-proxy-with-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!