Scrapy using pool of random proxies to avoid being banned

此生再无相见时 提交于 2019-11-30 05:31:51
  1. There's not a correct answer for this. Some proxies are not always available so you have to check them now and then. Also, if you use the same proxy every time the server you are scraping may block its IP as well, but that depends on the security mechanisms this server has.
  2. Yes, because you don't know if all the proxies you have in your pool support HTTPS. Or you could have just one pool and add a field to each proxy that indicates its HTTPS support.
  3. In your settings your are disabling the user agent middleware: 'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware' : None. The USER_AGENT setting won't have any effect.

There is already a library to do this. https://github.com/aivarsk/scrapy-proxies

Please download it from there. It has not been in pypi.org yet, so you can't install it easily using pip or easy_install.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!