Python 3: does Pool keep the original order of data passed to map?

人走茶凉 提交于 2019-12-18 03:49:16

问题


I have written a little script to distribute workload between 4 threads and to test whether the results stay ordered (in respect to the order of the input):

from multiprocessing import Pool
import numpy as np
import time
import random


rows = 16
columns = 1000000

vals = np.arange(rows * columns, dtype=np.int32).reshape(rows, columns)

def worker(arr):
    time.sleep(random.random())        # let the process sleep a random
    for idx in np.ndindex(arr.shape):  # amount of time to ensure that
        arr[idx] += 1                  # the processes finish at different
                                       # time steps
    return arr

# create the threadpool
with Pool(4) as p:
    # schedule one map/worker for each row in the original data
    q = p.map(worker, [row for row in vals])

for idx, row in enumerate(q):
    print("[{:0>2}]: {: >8} - {: >8}".format(idx, row[0], row[-1]))

For me this always results in:

[00]:        1 -  1000000
[01]:  1000001 -  2000000
[02]:  2000001 -  3000000
[03]:  3000001 -  4000000
[04]:  4000001 -  5000000
[05]:  5000001 -  6000000
[06]:  6000001 -  7000000
[07]:  7000001 -  8000000
[08]:  8000001 -  9000000
[09]:  9000001 - 10000000
[10]: 10000001 - 11000000
[11]: 11000001 - 12000000
[12]: 12000001 - 13000000
[13]: 13000001 - 14000000
[14]: 14000001 - 15000000
[15]: 15000001 - 16000000

Question: So, does Pool really keep the original input's order when storing the results of each map function in q?

Sidenote: I am asking this, because I need an easy way to parallelize work over several workers. In some cases the ordering is irrelevant. However, there are some cases where the results (like in q) have to be returned in the original order, because I'm using an additional reduce function that relies on ordered data.

Performance: On my machine this operation is about 4 times faster (as expected, since I have 4 cores) than normal execution on a single process. Additionally, all 4 cores are at 100% usage during the runtime.


回答1:


Pool.map results are ordered. If you need order, great; if you don't, Pool.imap_unordered may be a useful optimization.

Note that while the order in which you receive the results from Pool.map is fixed, the order in which they are computed is arbitrary.




回答2:


The documentation bills it as a "parallel equivalent of the map() built-in function". Since map is guaranteed to preserve order, multiprocessing.Pool.map makes that guarantee too.



来源:https://stackoverflow.com/questions/41273960/python-3-does-pool-keep-the-original-order-of-data-passed-to-map

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!