What's the difference between python's multiprocessing and concurrent.futures?

末鹿安然 提交于 2019-11-30 11:33:28
dano

You actually should use the if __name__ == "__main__" guard with ProcessPoolExecutor, too: It's using multiprocessing.Process to populate its Pool under the covers, just like multiprocessing.Pool does, so all the same caveats regarding picklability (especially on Windows), etc. apply.

I believe that ProcessPoolExecutor is meant to eventually replace multiprocessing.Pool, according to this statement made by Jesse Noller (a Python core contributor), when asked why Python has both APIs:

Brian and I need to work on the consolidation we intend(ed) to occur as people got comfortable with the APIs. My eventual goal is to remove anything but the basic multiprocessing.Process/Queue stuff out of MP and into concurrent.* and support threading backends for it.

For now, ProcessPoolExecutor is doing the exact same thing as multiprocessing.Pool with a simpler (and more limited) API. If you can get away with using ProcessPoolExecutor, use that, because I think it's more likely to get enhancements in the long-term.

Note that you can use all the helpers from multiprocessing with ProcessPoolExecutor, like Lock, Queue, Manager, etc. The main reasons to use multiprocessing.Pool is if you need initializer/initargs (though there is an open bug to get those added to ProcessPoolExecutor), or maxtasksperchild. Or you're running Python 2.7 or earlier, and don't want to install (or require your users to install) the backport of concurrent.futures.

Edit:

Also worth noting: According to this question, multiprocessing.Pool.map outperforms ProcessPoolExecutor.map. Note that the performance difference is very small per work item, so you'll probably only notice a large performance difference if you're using map on a very large iterable. The reason for the performance difference is that multiprocessing.Pool will batch the iterable passed to map into chunks, and then pass the chunks to the worker processes, which reduces the overhead of IPC between the parent and children. ProcessPoolExecutor always passes one item from the iterable at a time to the children, which can lead to much slower performance with large iterables, due to the increased IPC overhead. The good news is this issue will be fixed in Python 3.5, as as chunksize keyword argument has been added to ProcessPoolExecutor.map, which can be used to specify a larger chunk size if you know you're dealing with large iterables. See this bug for more info.

if __name__ == '__main__': just means that you invoked the script on the command prompt using python <scriptname.py> [options] instead of import <scriptname> in the python shell.

When you invoke a script from the command prompt, the __main__ method gets called. In the second block, the

with ProcessPoolExecutor() as executor:
    result = executor.map(calculate, range(4))

block is executed regardless of whether it was invoked from the command prompt or imported from the shell.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!