Scrapy Crawling Speed is Slow (60 pages / min)

非 Y 不嫁゛ 提交于 2019-12-03 09:47:22

问题


I am experiencing slow crawl speeds with scrapy (around 1 page / sec). I'm crawling a major website from aws servers so I don't think its a network issue. Cpu utilization is nowhere near 100 and if I start multiple scrapy processes crawl speed is much faster.

Scrapy seems to crawl a bunch of pages, then hangs for several seconds, and then repeats.

I've tried playing with: CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_DOMAIN = 500

but this doesn't really seem to move the needle past about 20.


回答1:


Are you sure you are allowed to crawl the destination site at high speed? Many sites implement download threshold and "after a while" start responding slowly.



来源:https://stackoverflow.com/questions/13505194/scrapy-crawling-speed-is-slow-60-pages-min

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!