Use numpy array in shared memory for multiprocessing

后端 未结 5 1615
隐瞒了意图╮
隐瞒了意图╮ 2020-11-22 03:51

I would like to use a numpy array in shared memory for use with the multiprocessing module. The difficulty is using it like a numpy array, and not just as a ctypes array.

5条回答
  •  孤城傲影
    2020-11-22 03:59

    While the answers already given are good, there is a much easier solution to this problem provided two conditions are met:

    1. You are on a POSIX-compliant operating system (e.g. Linux, Mac OSX); and
    2. Your child processes need read-only access to the shared array.

    In this case you do not need to fiddle with explicitly making variables shared, as the child processes will be created using a fork. A forked child automatically shares the parent's memory space. In the context of Python multiprocessing, this means it shares all module-level variables; note that this does not hold for arguments that you explicitly pass to your child processes or to the functions you call on a multiprocessing.Pool or so.

    A simple example:

    import multiprocessing
    import numpy as np
    
    # will hold the (implicitly mem-shared) data
    data_array = None
    
    # child worker function
    def job_handler(num):
        # built-in id() returns unique memory ID of a variable
        return id(data_array), np.sum(data_array)
    
    def launch_jobs(data, num_jobs=5, num_worker=4):
        global data_array
        data_array = data
    
        pool = multiprocessing.Pool(num_worker)
        return pool.map(job_handler, range(num_jobs))
    
    # create some random data and execute the child jobs
    mem_ids, sumvals = zip(*launch_jobs(np.random.rand(10)))
    
    # this will print 'True' on POSIX OS, since the data was shared
    print(np.all(np.asarray(mem_ids) == id(data_array)))
    

提交回复
热议问题