问题
I am writing MapReduce job in Python, and want to use some third libraries like chardet
.
I konw that we can use option -libjars=...
to include them for java MapReduce.
But how to include third party libraries in Python MapReduce Job ?
Thank you!
回答1:
Problem has been solved by zipimport
.
Then I zip chardet
to file module.mod
, and used like this:
importer = zipimport.zipimporter('module.mod')
chardet = importer.load_module('chardet')
Add -file module.mod
in hadoop streaming command.
Now chardet
can be used in script.
More details shown in: How can I include a python package with Hadoop streaming job?
来源:https://stackoverflow.com/questions/15352981/hadoop-how-to-include-third-party-library-in-python-mapreduce