I\'m aware of various discussions of limitations of the multiprocessing module when dealing with functions that are data members of a class (due to Pickling problems).
Steven Bethard has posted a way to allow methods to be pickled/unpickled. You could use it like this:
import multiprocessing as mp
import copy_reg
import types
def _pickle_method(method):
# Author: Steven Bethard
# http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
func_name = method.im_func.__name__
obj = method.im_self
cls = method.im_class
cls_name = ''
if func_name.startswith('__') and not func_name.endswith('__'):
cls_name = cls.__name__.lstrip('_')
if cls_name:
func_name = '_' + cls_name + func_name
return _unpickle_method, (func_name, obj, cls)
def _unpickle_method(func_name, obj, cls):
# Author: Steven Bethard
# http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
for cls in cls.mro():
try:
func = cls.__dict__[func_name]
except KeyError:
pass
else:
break
return func.__get__(obj, cls)
# This call to copy_reg.pickle allows you to pass methods as the first arg to
# mp.Pool methods. If you comment out this line, `pool.map(self.foo, ...)` results in
# PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
# __builtin__.instancemethod failed
copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)
class MyClass(object):
def __init__(self):
self.my_args = [1,2,3,4]
self.output = {}
def my_single_function(self, arg):
return arg**2
def my_parallelized_function(self):
# Use map or map_async to map my_single_function onto the
# list of self.my_args, and append the return values into
# self.output, using each arg in my_args as the key.
# The result should make self.output become
# {1:1, 2:4, 3:9, 4:16}
self.output = dict(zip(self.my_args,
pool.map(self.my_single_function, self.my_args)))
Then
pool = mp.Pool()
foo = MyClass()
foo.my_parallelized_function()
yields
print foo.output
# {1: 1, 2: 4, 3: 9, 4: 16}
There is a better elegant solution i believe. Add the following line to a code that does multiprocessing with the class and you can still pass the method through the pool. the codes should go above the class
import copy_reg
import types
def _reduce_method(meth):
return (getattr,(meth.__self__,meth.__func__.__name__))
copy_reg.pickle(types.MethodType,_reduce_method)
for more understanding of how to pickle a method please see below http://docs.python.org/2/library/copy_reg.html
If you use a fork of multiprocessing called pathos.multiprocesssing, you can directly use classes and class methods in multiprocessing's map functions. This is because dill is used instead of pickle or cPickle, and dill can serialize almost anything in python.
pathos.multiprocessing also provides an asynchronous map function… and it can map functions with multiple arguments (e.g. map(math.pow, [1,2,3], [4,5,6]))
See: What can multiprocessing and dill do together?
and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>>
>>> p = Pool(4)
>>>
>>> def add(x,y):
... return x+y
...
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>>
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>>
>>> class Test(object):
... def plus(self, x, y):
... return x+y
...
>>> t = Test()
>>>
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>>
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]
So you can do exactly what you wanted to do, I believe.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>>
>>> class MyClass():
... def __init__(self):
... self.my_args = [1,2,3,4]
... self.output = {}
... def my_single_function(self, arg):
... return arg**2
... def my_parallelized_function(self):
... res = p.map(self.my_single_function, self.my_args)
... self.output = dict(zip(self.my_args, res))
...
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool()
>>>
>>> foo = MyClass()
>>> foo.my_parallelized_function()
>>> foo.output
{1: 1, 2: 4, 3: 9, 4: 16}
>>>
Get the code here: https://github.com/uqfoundation/pathos