Use numpy array in shared memory for multiprocessing

后端未结

关注

 5  1615

隐瞒了意图╮ 2020-11-22 03:51

I would like to use a numpy array in shared memory for use with the multiprocessing module. The difficulty is using it like a numpy array, and not just as a ctypes array.

5条回答

孤城傲影 (楼主)

2020-11-22 03:59
While the answers already given are good, there is a much easier solution to this problem provided two conditions are met:
1. You are on a POSIX-compliant operating system (e.g. Linux, Mac OSX); and
2. Your child processes need read-only access to the shared array.
In this case you do not need to fiddle with explicitly making variables shared, as the child processes will be created using a fork. A forked child automatically shares the parent's memory space. In the context of Python multiprocessing, this means it shares all module-level variables; note that this does not hold for arguments that you explicitly pass to your child processes or to the functions you call on a multiprocessing.Pool or so.

A simple example:
```
import multiprocessing
import numpy as np

# will hold the (implicitly mem-shared) data
data_array = None

# child worker function
def job_handler(num):
    # built-in id() returns unique memory ID of a variable
    return id(data_array), np.sum(data_array)

def launch_jobs(data, num_jobs=5, num_worker=4):
    global data_array
    data_array = data

    pool = multiprocessing.Pool(num_worker)
    return pool.map(job_handler, range(num_jobs))

# create some random data and execute the child jobs
mem_ids, sumvals = zip(*launch_jobs(np.random.rand(10)))

# this will print 'True' on POSIX OS, since the data was shared
print(np.all(np.asarray(mem_ids) == id(data_array)))
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...