I\'m currently assigning random proxies to requests via a custom middleware. I\'d like to key download throttling to the specific proxy that the request is using, but as far
As recommended on the Scrapy mailing list, there is a special request meta variable that the Autothrottle middleware obeys, called download_slot
- this allows for programmatic grouping/throttling of requests.
In my custom proxy middleware:
self.proxies = get_proxies() #list of proxies
proxy_address = random.choice(self.proxies)
request.meta['proxy'] = proxy_address
request.meta['download_slot'] = hash(proxy_address) % MAX_CONCURRENT_REQUESTS
I use the hash function as a cheap way to bucket the requests by an externally defined limit on requests.