问题
My original issue is that I am trying to do the following:
def submit_decoder_process(decoder, input_line):
decoder.process_line(input_line)
return decoder
self.pool = Pool(processes=num_of_processes)
self.pool.apply_async(submit_decoder_process, [decoder, input_line]).get()
decoder is a bit involved to describe here, but the important thing is that decoder is an object that is initialized with PyParsing expression that calls setParseAction(). This fails pickle that multiprocessing uses and this in turn fails the above code.
Now, here is the pickle/PyParsing problem that I have isolated and simplified. The following code yields an error message due to pickle failure.
import pickle
from pyparsing import *
def my_pa_func():
pass
pickle.dumps(Word(nums).setParseAction(my_pa_func))
Error message:
pickle.PicklingError: Can't pickle <function wrapper at 0x00000000026534A8>: it's not found as pyparsing.wrapper
Now If you remove the call .setParseAction(my_pa_func), it will work with no problems:
pickle.dumps(Word(nums))
How can I get around it? Multiprocesing uses pickle, so I can't avoid it, I guess. The pathos package that is supposedly uses dill is not mature enough, at least, I am having problems installing it on my Windows-64bit. I am really scratching my head here.
回答1:
OK, here is the solution inspired by rocksportrocker: Python multiprocessing pickling error
The idea is to dill the object that can't be pickled while passing it back and forth between processes and then "undill" it after it has been passed:
from multiprocessing import Pool
import dill
def submit_decoder_process(decoder_dill, input_line):
decoder = dill.loads(decoder_dill) # undill after it was passed to a pool process
decoder.process_line(input_line)
return dill.dumps(decoder) # dill before passing back to parent process
self.pool = Pool(processes=num_of_processes)
# Dill before sending to a pool process
decoder_processed = dill.loads(self.pool.apply_async(submit_decoder_process, [dill.dumps(decoder), input_line]).get())
回答2:
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled
The multiprocessing.Pool uses the Pickle's protocol to serialize the function and module names (in your example setParseAction and pyparse) which are delivered through the Pipe to the child process.
The child process, once receives them, it imports the module and try to call the function. The problem is that what you're passing is not a function but a method. To resolve it, the Pickle protocol should be clever enough to build 'Word' object with the 'user' parameter and then call the setParseAction method. As handling these cases is too complicated, the Pickle protocol prevents you to serialize non top level functions.
To solve your issue either you instruct the Pickle's module on how to serialize the setParseAction method (https://docs.python.org/2/library/pickle.html#pickle-protocol) or you refactor your code in a way that what's passed to the Pool.apply_async is serializable.
What if you pass the Word object to the child process and you let it call the Word().setParseAction()?
回答3:
I'd suggest pathos.multiprocessing
, as you mention. Of course, I'm the pathos
author, so I guess that's not a surprise. It appears that there might be a distutils
bug that you are running into, as referenced here: https://github.com/uqfoundation/pathos/issues/49.
Your solution using dill
is a good workaround. You also might be able to forgo installing the entire pathos
package, and just install the pathos
fork of the multiprocessing
package (which uses dill
instead of pickle
). You can find it here: http://dev.danse.us/packages or here: https://github.com/uqfoundation/pathos/tree/master/external,
来源:https://stackoverflow.com/questions/27883574/cant-pickle-pyparsing-expression-with-setparseaction-method-needed-for-multi