Getting erratic runtime exceptions trying to access persistant data in multiprocessing.Pool worker processes

问题

Inspired by this solution I am trying to set up a multiprocessing pool of worker processes in Python. The idea is to pass some data to the worker processes before they actually start their work and reuse it eventually. It's intended to minimize the amount of data which needs to be packed/unpacked for every call into a worker process (i.e. reducing inter-process communication overhead). My MCVE looks like this:

import multiprocessing as mp
import numpy as np

def create_worker_context():
    global context # create "global" context in worker process
    context = {}

def init_worker_context(worker_id, some_const_array, DIMS, DTYPE):
    context.update({
        'worker_id': worker_id,
        'some_const_array': some_const_array,
        'tmp': np.zeros((DIMS, DIMS), dtype = DTYPE),
        }) # store context information in global namespace of worker
    return True # return True, verifying that the worker process received its data

class data_analysis:
    def __init__(self):
        self.DTYPE = 'float32'
        self.CPU_LEN = mp.cpu_count()
        self.DIMS = 100
        self.some_const_array = np.zeros((self.DIMS, self.DIMS), dtype = self.DTYPE)
        # Init multiprocessing pool
        self.cpu_pool = mp.Pool(processes = self.CPU_LEN, initializer = create_worker_context) # create pool and context in workers
        pool_results = [
            self.cpu_pool.apply_async(
                init_worker_context,
                args = (core_id, self.some_const_array, self.DIMS, self.DTYPE)
            ) for core_id in range(self.CPU_LEN)
            ] # pass information to workers' context
        result_batches = [result.get() for result in pool_results] # check if they got the information
        if not all(result_batches): # raise an error if things did not work
            raise SyntaxError('Workers could not be initialized ...')

    @staticmethod
    def process_batch(batch_data):
        context['tmp'][:,:] = context['some_const_array'] + batch_data # some fancy computation in worker
        return context['tmp'] # return result

    def process_all(self):
        input_data = np.arange(0, self.DIMS ** 2, dtype = self.DTYPE).reshape(self.DIMS, self.DIMS)
        pool_results = [
            self.cpu_pool.apply_async(
                data_analysis.process_batch,
                args = (input_data,)
            ) for _ in range(self.CPU_LEN)
            ] # let workers actually work
        result_batches = [result.get() for result in pool_results]
        for batch in result_batches[1:]:
            np.add(result_batches[0], batch, out = result_batches[0]) # reduce batches
        print(result_batches[0]) # show result

if __name__ == '__main__':
    data_analysis().process_all()

I am running the above with CPython 3.6.6.

The strange thing is ... sometimes it works, sometimes it does not. If it does not work, the process_batch method throws an exception, because it can not find some_const_array as a key in context. The full traceback looks like this:

(env) me@box:/path> python so.py 
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/path/so.py", line 37, in process_batch
    context['tmp'][:,:] = context['some_const_array'] + batch_data # some fancy computation in worker
KeyError: 'some_const_array'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/path/so.py", line 54, in <module>
    data_analysis().process_all()
  File "/path/so.py", line 48, in process_all
    result_batches = [result.get() for result in pool_results]
  File "/path/so.py", line 48, in <listcomp>
    result_batches = [result.get() for result in pool_results]
  File "/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
KeyError: 'some_const_array'

I am puzzled. What is going on here?

If my context dictionaries contain an object of "higher type", e.g. a database driver or similar, I am not getting this kind of problem. I can only reproduce this if my context dictionaries contain basic Python data types, collections or numpy arrays.

(Is there a potentially better approach for achieving the same thing in a more reliable manner? I know my approach is considered a hack ...)

回答1:

You need to relocate the content of init_worker_context into your initializer function create_worker_context.

Your assumption that every single worker process will run init_worker_context is responsible for your confusion. The tasks you submit to a pool get fed into one internal taskqueue all worker processes read from. What happens in your case is, that some worker processes complete their task and compete again for getting new tasks. So it can happen that one worker processes will execute multiple tasks while another one will not get a single one. Keep in mind the OS schedules runtime for threads (of the worker processes).

来源：https://stackoverflow.com/questions/53743047/getting-erratic-runtime-exceptions-trying-to-access-persistant-data-in-multiproc

标签

python

python-3.x

multiprocessing

python-multiprocessing

pool