Getting erratic runtime exceptions trying to access persistant data in multiprocessing.Pool worker processes

你。 提交于 2019-12-11 06:09:13

问题


Inspired by this solution I am trying to set up a multiprocessing pool of worker processes in Python. The idea is to pass some data to the worker processes before they actually start their work and reuse it eventually. It's intended to minimize the amount of data which needs to be packed/unpacked for every call into a worker process (i.e. reducing inter-process communication overhead). My MCVE looks like this:

import multiprocessing as mp
import numpy as np

def create_worker_context():
    global context # create "global" context in worker process
    context = {}

def init_worker_context(worker_id, some_const_array, DIMS, DTYPE):
    context.update({
        'worker_id': worker_id,
        'some_const_array': some_const_array,
        'tmp': np.zeros((DIMS, DIMS), dtype = DTYPE),
        }) # store context information in global namespace of worker
    return True # return True, verifying that the worker process received its data

class data_analysis:
    def __init__(self):
        self.DTYPE = 'float32'
        self.CPU_LEN = mp.cpu_count()
        self.DIMS = 100
        self.some_const_array = np.zeros((self.DIMS, self.DIMS), dtype = self.DTYPE)
        # Init multiprocessing pool
        self.cpu_pool = mp.Pool(processes = self.CPU_LEN, initializer = create_worker_context) # create pool and context in workers
        pool_results = [
            self.cpu_pool.apply_async(
                init_worker_context,
                args = (core_id, self.some_const_array, self.DIMS, self.DTYPE)
            ) for core_id in range(self.CPU_LEN)
            ] # pass information to workers' context
        result_batches = [result.get() for result in pool_results] # check if they got the information
        if not all(result_batches): # raise an error if things did not work
            raise SyntaxError('Workers could not be initialized ...')

    @staticmethod
    def process_batch(batch_data):
        context['tmp'][:,:] = context['some_const_array'] + batch_data # some fancy computation in worker
        return context['tmp'] # return result

    def process_all(self):
        input_data = np.arange(0, self.DIMS ** 2, dtype = self.DTYPE).reshape(self.DIMS, self.DIMS)
        pool_results = [
            self.cpu_pool.apply_async(
                data_analysis.process_batch,
                args = (input_data,)
            ) for _ in range(self.CPU_LEN)
            ] # let workers actually work
        result_batches = [result.get() for result in pool_results]
        for batch in result_batches[1:]:
            np.add(result_batches[0], batch, out = result_batches[0]) # reduce batches
        print(result_batches[0]) # show result

if __name__ == '__main__':
    data_analysis().process_all()

I am running the above with CPython 3.6.6.

The strange thing is ... sometimes it works, sometimes it does not. If it does not work, the process_batch method throws an exception, because it can not find some_const_array as a key in context. The full traceback looks like this:

(env) me@box:/path> python so.py 
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/path/so.py", line 37, in process_batch
    context['tmp'][:,:] = context['some_const_array'] + batch_data # some fancy computation in worker
KeyError: 'some_const_array'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/path/so.py", line 54, in <module>
    data_analysis().process_all()
  File "/path/so.py", line 48, in process_all
    result_batches = [result.get() for result in pool_results]
  File "/path/so.py", line 48, in <listcomp>
    result_batches = [result.get() for result in pool_results]
  File "/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
KeyError: 'some_const_array'

I am puzzled. What is going on here?

If my context dictionaries contain an object of "higher type", e.g. a database driver or similar, I am not getting this kind of problem. I can only reproduce this if my context dictionaries contain basic Python data types, collections or numpy arrays.

(Is there a potentially better approach for achieving the same thing in a more reliable manner? I know my approach is considered a hack ...)


回答1:


You need to relocate the content of init_worker_context into your initializer function create_worker_context.

Your assumption that every single worker process will run init_worker_context is responsible for your confusion. The tasks you submit to a pool get fed into one internal taskqueue all worker processes read from. What happens in your case is, that some worker processes complete their task and compete again for getting new tasks. So it can happen that one worker processes will execute multiple tasks while another one will not get a single one. Keep in mind the OS schedules runtime for threads (of the worker processes).



来源:https://stackoverflow.com/questions/53743047/getting-erratic-runtime-exceptions-trying-to-access-persistant-data-in-multiproc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!