How to handle a 429 Too Many Requests response in Scrapy?

后端 未结 3 1041
深忆病人
深忆病人 2020-12-28 22:54

I\'m trying to run a scraper of which the output log ends as follows:

2017-04-25 20:22:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <42         


        
3条回答
  •  悲&欢浪女
    2020-12-28 23:10

    Wow, your scraper is going really fast, over 30,000 requests in 30 minutes. That's more than 10 requests per second.

    Such a high volume will trigger rate limiting on bigger sites and will completely bring down smaller sites. Don't do that.

    Also this might even be too fast for privoxy and tor, so these might also be candidates for those replies with a 429.

    Solutions:

    1. Start slow. Reduce the concurrency settings and increase DOWNLOAD_DELAY so you do at max 1 request per second. Then increase these values step by step and see what happens. It might sound paradox, but you might be able to get more items and more 200 response by going slower.

    2. If you are scraping a big site try rotating proxies. The tor network might be a bit heavy handed for this in my experience, so you might try a proxy service like Umair is suggesting

提交回复
热议问题