问题
I am experiencing slow crawl speeds with scrapy (around 1 page / sec). I'm crawling a major website from aws servers so I don't think its a network issue. Cpu utilization is nowhere near 100 and if I start multiple scrapy processes crawl speed is much faster.
Scrapy seems to crawl a bunch of pages, then hangs for several seconds, and then repeats.
I've tried playing with: CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_DOMAIN = 500
but this doesn't really seem to move the needle past about 20.
回答1:
Are you sure you are allowed to crawl the destination site at high speed? Many sites implement download threshold and "after a while" start responding slowly.
来源:https://stackoverflow.com/questions/13505194/scrapy-crawling-speed-is-slow-60-pages-min