问题
I have worked on Hadoop MR for quite some time and I have created and used custom(extension) Writable
classes including MapWritable
. Now I am required to translate the same MR that I have written in Java to Python. I do not have experience in python and am now exploring the various libraries for the same. I am looking into some options like Pydoop and Mrjob. However, I want to know if these libraries contain the option to create similar custom Writable
classes and how to create them. If not, what possible alternatives exist to do the same?
回答1:
In Pydoop, explicit support for custom Hadoop types is still WIP. In other words, right now we're not making things easy for the user, but it can be done with a bit of work. A couple of pointers:
Pydoop already includes custom Java code, auto-installed together with the Python package as
pydoop.jar
. We pass this extra jar to Hadoop as needed. Adding more Java code is a matter of placing the source insrc/
and listing it inJavaLib.java_files
insetup.py
On the Python side, you need deserializers for the new types. See for instance
LongWritableDeserializer
inpydoop.mapreduce.pipes
.
Hope this helps.
来源:https://stackoverflow.com/questions/51643536/create-custom-writable-key-value-type-in-python-for-hadoop-map-reduce