I would like to use a numpy array in shared memory for use with the multiprocessing module. The difficulty is using it like a numpy array, and not just as a ctypes array.>
While the answers already given are good, there is a much easier solution to this problem provided two conditions are met:
In this case you do not need to fiddle with explicitly making variables shared, as the child processes will be created using a fork. A forked child automatically shares the parent's memory space. In the context of Python multiprocessing, this means it shares all module-level variables; note that this does not hold for arguments that you explicitly pass to your child processes or to the functions you call on a multiprocessing.Pool
or so.
A simple example:
import multiprocessing
import numpy as np
# will hold the (implicitly mem-shared) data
data_array = None
# child worker function
def job_handler(num):
# built-in id() returns unique memory ID of a variable
return id(data_array), np.sum(data_array)
def launch_jobs(data, num_jobs=5, num_worker=4):
global data_array
data_array = data
pool = multiprocessing.Pool(num_worker)
return pool.map(job_handler, range(num_jobs))
# create some random data and execute the child jobs
mem_ids, sumvals = zip(*launch_jobs(np.random.rand(10)))
# this will print 'True' on POSIX OS, since the data was shared
print(np.all(np.asarray(mem_ids) == id(data_array)))