Pyspark got TypeError: can’t pickle _abc_data objects

杀马特。学长 韩版系。学妹 提交于 2020-06-28 04:18:16

问题


I’m trying to generate predictions from a pickled model with pyspark, I get the model with the following command

model = deserialize_python_object(filename)

with deserialize_python_object(filename) defined as:

import pickle
def deserialize_python_object(filename):
try:
    with open(filename, ‘rb’) as f:
        obj = pickle.load(f)
except:
    obj = None
return obj

the error log looks like:

File “/Users/gmg/anaconda3/envs/env/lib**strong text**/python3.7/site-packages/pyspark/sql/udf.py”, line 189, in wrapper
    return self(*args)
  File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 167, in __call__
    judf = self._judf
  File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 151, in _judf
    self._judf_placeholder = self._create_judf()
  File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 160, in _create_judf
    wrapped_func = _wrap_function(sc, self.func, self.returnType)
  File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 35, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/rdd.py”, line 2420, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/serializers.py”, line 600, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: TypeError: can’t pickle _abc_data objects

回答1:


Seems that you are having the same problem like in this issue: https://github.com/cloudpipe/cloudpickle/issues/180

What is happening is that pyspark's cloudpickle library is outdated for python 3.7, you should fix the problem with this crafted patch by now until pyspark gets that module updated.

Try using this workaround:

  1. Install cloudpickle pip install cloudpickle

  2. Add this to your code:

import cloudpickle
import pyspark.serializers
pyspark.serializers.cloudpickle = cloudpickle

monkeypatch credit https://github.com/cloudpipe/cloudpickle/issues/305



来源:https://stackoverflow.com/questions/59058588/pyspark-got-typeerror-can-t-pickle-abc-data-objects

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!