Hadoop: How to include third party library in Python MapReduce [duplicate]

大城市里の小女人 提交于 2019-12-11 08:34:20

问题


I am writing MapReduce job in Python, and want to use some third libraries like chardet.

I konw that we can use option -libjars=... to include them for java MapReduce.

But how to include third party libraries in Python MapReduce Job ?

Thank you!


回答1:


Problem has been solved by zipimport.

Then I zip chardet to file module.mod, and used like this:

importer = zipimport.zipimporter('module.mod')
chardet = importer.load_module('chardet')

Add -file module.mod in hadoop streaming command.

Now chardet can be used in script.

More details shown in: How can I include a python package with Hadoop streaming job?



来源:https://stackoverflow.com/questions/15352981/hadoop-how-to-include-third-party-library-in-python-mapreduce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!