Passing class functions to PySpark RDD

丶灬走出姿态 提交于 2019-12-06 16:33:46

All Python dependencies have to be either present on the search path of the worker nodes or distributed manually using SparkContext.addPyFile method so something like this should do the trick:

sc.addPyFile("/some-folder/app/bin/file.py")

It will copy the file to all the workers and place in the working directory.

On a side note please don't use file as module name, even if it is only an example. Shadowing built-in functions in Python is not a very good idea.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!