Python multiprocessing is taking much longer than single processing

后端 未结 2 1836
礼貌的吻别
礼貌的吻别 2021-01-04 07:59

I am performing some large computations on 3 different numpy 2D arrays sequentially. The arrays are huge, 25000x25000 each. Each computation takes significant time so I deci

2条回答
  •  感情败类
    2021-01-04 08:11

    Here is an example using np.memmap and Pool. See that you can define the number of processes and workers. In this case you don't have control over the queue, which can be achieved using multiprocessing.Queue:

    from multiprocessing import Pool
    
    import numpy as np
    
    def mysum(array_file_name, col1, col2, shape):
        a = np.memmap(array_file_name, shape=shape, mode='r+')
        a[:, col1:col2] = np.random.random((shape[0], col2-col1))
        ans = a[:, col1:col2].sum()
        del a
        return ans
    
    if __name__ == '__main__':
        nop = 1000 # number_of_processes
        now = 3 # number of workers
        p = Pool(now)
        array_file_name = 'test.array'
        shape = (250000, 250000)
        a = np.memmap(array_file_name, shape=shape, mode='w+')
        del a
        cols = [[shape[1]*i/nop, shape[1]*(i+1)/nop] for i in range(nop)]
        results = []
        for c1, c2 in cols:
            r = p.apply_async(mysum, args=(array_file_name, c1, c2, shape))
            results.append(r)
        p.close()
        p.join()
    
        final_result = sum([r.get() for r in results])
        print final_result
    

    You can achieve better performances using shared memory parallel processing, when possible. See this related question:

    • Shared-memory objects in python multiprocessing

提交回复
热议问题