问题
I’m trying to generate predictions from a pickled model with pyspark, I get the model with the following command
model = deserialize_python_object(filename)
with deserialize_python_object(filename)
defined as:
import pickle
def deserialize_python_object(filename):
try:
with open(filename, ‘rb’) as f:
obj = pickle.load(f)
except:
obj = None
return obj
the error log looks like:
File “/Users/gmg/anaconda3/envs/env/lib**strong text**/python3.7/site-packages/pyspark/sql/udf.py”, line 189, in wrapper
return self(*args)
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 167, in __call__
judf = self._judf
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 151, in _judf
self._judf_placeholder = self._create_judf()
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 160, in _create_judf
wrapped_func = _wrap_function(sc, self.func, self.returnType)
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 35, in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/rdd.py”, line 2420, in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/serializers.py”, line 600, in dumps
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: TypeError: can’t pickle _abc_data objects
回答1:
Seems that you are having the same problem like in this issue: https://github.com/cloudpipe/cloudpickle/issues/180
What is happening is that pyspark's cloudpickle library is outdated for python 3.7, you should fix the problem with this crafted patch by now until pyspark gets that module updated.
Try using this workaround:
Install cloudpickle
pip install cloudpickle
Add this to your code:
import cloudpickle
import pyspark.serializers
pyspark.serializers.cloudpickle = cloudpickle
monkeypatch credit https://github.com/cloudpipe/cloudpickle/issues/305
来源:https://stackoverflow.com/questions/59058588/pyspark-got-typeerror-can-t-pickle-abc-data-objects