why is multiprocess Pool slower than a for loop?

后端 未结 3 979
滥情空心
滥情空心 2020-12-17 03:58
from multiprocessing import Pool

def op1(data):
    return [data[elem] + 1 for elem in range(len(data))]
data = [[elem for elem in range(20)] for elem in range(5000         


        
3条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-17 05:00

    There are a couple of potential trouble spots with your code, but primarily it's too simple.

    The multiprocessing module works by creating different processes, and communicating among them. For each process created, you have to pay the operating system's process startup cost, as well as the python startup cost. Those costs can be high, or low, but they're non-zero in any case.

    Once you pay those startup costs, you then pool.map the worker function across all the processes. Which basically adds 1 to a few numbers. This is not a significant load, as your tests prove.

    What's worse, you're using .map() which is implicitly ordered (compare with .imap_unordered()), so there's synchronization going on - leaving even less freedom for the various CPU cores to give you speed.

    If there's a problem here, it's a "design of experiment" problem - you haven't created a sufficiently difficult problem for multiprocessing to be able to help you.

提交回复
热议问题