Get proxy ip address scrapy using to crawl

大城市里の小女人 提交于 2019-11-30 15:31:52

You can yield the first request to check your public IP, and compare this to the IP you see when you go to http://checkip.dyndns.org/ without using Tor/VPN. If they are not the same, scrapy is using a different IP obviously.

def start_reqests():
    yield Request('http://checkip.dyndns.org/', callback=self.check_ip)
    # yield other requests from start_urls here if needed

def check_ip(self, response):
    pub_ip = response.xpath('//body/text()').re('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')[0]
    print "My public IP is: " + pub_ip

    # yield other requests here if needed    

The fastest option would be to use the scrapy shell and check for the meta to contain the proxy.

Start it from the project root:

$ scrapy shell http://google.com
>>> request.meta
{'handle_httpstatus_all': True, 'redirect_ttl': 20, 'download_timeout': 180, 'proxy': 'http://127.0.0.1:8123', 'download_latency': 0.4804518222808838, 'download_slot': 'google.com'}
>>> response.meta
{'download_timeout': 180, 'handle_httpstatus_all': True, 'redirect_ttl': 18, 'redirect_times': 2, 'redirect_urls': ['http://google.com', 'http://www.google.com/'], 'depth': 0, 'proxy': 'http://127.0.0.1:8123', 'download_latency': 1.5814828872680664, 'download_slot': 'google.com'}

This way you would check that middleware is configured correctly and the request is going through proxy.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!