is Scrapy single-threaded or multi-threaded?

若如初见. 提交于 2019-12-05 14:51:57

问题


There are few concurrency settings in Scrapy, like CONCURRENT_REQUESTS. Does it mean, that Scrapy crawler is multi-threaded? So if I run scrapy crawl my_crawler it will literally fire multiple simultaneous requests in parallel? Im asking because, I've read that Scrapy is single-threaded.


回答1:


Scrapy is single-threaded, except the interactive shell and some tests, see source.

It's built on top of Twisted, which is single-threaded too, and makes use of it's own asynchronous concurrency capabilities, such as twisted.internet.interfaces.IReactorThreads.callFromThread, see source.




回答2:


Scrapy does most of it's work synchronously. However, the handling of requests is done asynchronously.

I suggest this page if you haven't already seen it.

http://doc.scrapy.org/en/latest/topics/architecture.html

edit: I realize now the question was about threading and not necessarily whether it's asynchronous or not. That link would still be a good read though :)

regarding your question about CONCURRENT_REQUESTS. This setting changes the number of requests that twisted will defer at once. Once that many requests have been started it will wait for some of them to finish before starting more.




回答3:


Scrapy is single-threaded framework, we cannot use multiple threads within a spider at the same time. However, we can create multiple spiders and piplines at the same time to make the process concurrent. Scrapy does not support multi-threading because it is built on Twisted, which is an Asynchronous http protocol framework.



来源:https://stackoverflow.com/questions/24761074/is-scrapy-single-threaded-or-multi-threaded

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!