Create custom writable key/value type in python for Hadoop Map Reduce?

余生长醉 提交于 2019-12-24 10:47:30

问题


I have worked on Hadoop MR for quite some time and I have created and used custom(extension) Writable classes including MapWritable. Now I am required to translate the same MR that I have written in Java to Python. I do not have experience in python and am now exploring the various libraries for the same. I am looking into some options like Pydoop and Mrjob. However, I want to know if these libraries contain the option to create similar custom Writable classes and how to create them. If not, what possible alternatives exist to do the same?


回答1:


In Pydoop, explicit support for custom Hadoop types is still WIP. In other words, right now we're not making things easy for the user, but it can be done with a bit of work. A couple of pointers:

  • Pydoop already includes custom Java code, auto-installed together with the Python package as pydoop.jar. We pass this extra jar to Hadoop as needed. Adding more Java code is a matter of placing the source in src/ and listing it in JavaLib.java_files in setup.py

  • On the Python side, you need deserializers for the new types. See for instance LongWritableDeserializer in pydoop.mapreduce.pipes.

Hope this helps.



来源:https://stackoverflow.com/questions/51643536/create-custom-writable-key-value-type-in-python-for-hadoop-map-reduce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!