spark-submit - cannot pickle class in package, but can pickle 'same' class in root folder

天涯浪子 提交于 2019-12-24 02:18:58

问题


In my Python-based Spark task 'main.py', I reference a protobuf generated class 'a_pb2.py'. If I place all files in the root directory like

/
- main.py
- a_pb2.py

and zip a_pb2.py into 'proto.zip', then run

spark-submit --py-files=proto.zip main.py

everything runs as expected.

However, if I move the protobuf classes to a package, organizing my files like

/
- main.py
- /protofiles
  - __init__.py
  - a_pb2.py

and zip /protofiles into 'proto.zip', then run

spark-submit --py-files=proto.zip main.py

the spark task fails, saying it cannot pickle class Account within a_pb2

Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 177, in main process() File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 172, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 272, in dump_stream bytes = self.serializer.dumps(vs) File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 451, in dumps return pickle.dumps(obj, protocol) _pickle.PicklingError: Can't pickle class 'a_pb2.Account': import of module 'a_pb2' failed

I am assuming this class serialization is happening to distribute the dependencies to worker nodes. But wouldn't this serialization also happen in the initial case, and if the class was unpickle-able, it should fail then as well? Needless to say I am confused, and not familiar with the nuances of how spark treats dependency packages+modules vs just modules.

Any suggestions appreciated!

来源:https://stackoverflow.com/questions/49540313/spark-submit-cannot-pickle-class-in-package-but-can-pickle-same-class-in-ro

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!