Python multiprocessing is taking much longer than single processing

后端未结

关注

 2  1836

礼貌的吻别 2021-01-04 07:59

I am performing some large computations on 3 different numpy 2D arrays sequentially. The arrays are huge, 25000x25000 each. Each computation takes significant time so I deci

2条回答

感情败类 (楼主)

2021-01-04 08:11

Here is an example using np.memmap and Pool. See that you can define the number of processes and workers. In this case you don't have control over the queue, which can be achieved using multiprocessing.Queue:

from multiprocessing import Pool

import numpy as np

def mysum(array_file_name, col1, col2, shape):
    a = np.memmap(array_file_name, shape=shape, mode='r+')
    a[:, col1:col2] = np.random.random((shape[0], col2-col1))
    ans = a[:, col1:col2].sum()
    del a
    return ans

if __name__ == '__main__':
    nop = 1000 # number_of_processes
    now = 3 # number of workers
    p = Pool(now)
    array_file_name = 'test.array'
    shape = (250000, 250000)
    a = np.memmap(array_file_name, shape=shape, mode='w+')
    del a
    cols = [[shape[1]*i/nop, shape[1]*(i+1)/nop] for i in range(nop)]
    results = []
    for c1, c2 in cols:
        r = p.apply_async(mysum, args=(array_file_name, c1, c2, shape))
        results.append(r)
    p.close()
    p.join()

    final_result = sum([r.get() for r in results])
    print final_result

You can achieve better performances using shared memory parallel processing, when possible. See this related question:

Shared-memory objects in python multiprocessing

0 讨论(0)

查看其它2个回答