Python UDFs in Pig

一世执手 提交于 2019-12-24 13:43:02

问题


I've seen the documentatio here, but I confess that I feel it rather lacking. I was wondering if anyone could give me collection of examples as to incorporating Python UDFs into Pig. In particular

  • Prior to Pig 0.10, the boolean type does not exist, but a FILTER operation requires the result resolve to a boolean. Am I forever cursed with returning 1 or 0 and using FILTER alias BY py_udf.f(field) > 0 if I don't have the latest version?
  • Are the Algebraic, Accumulator, and Filter interfaces inaccessible from Python?
  • Can I not access the Distributed Cache either?
  • What about Store/Load functions?

回答1:


Python UDFs are quite limited. You cannot use Algebraic or Accumulator interfaces, nor can you write a LoadFunc in Python. For anything more complicated than a map operation you will likely need to resort to a Java UDF.

That said, a more complex Python UDF with a dynamic outputSchema can be found at http://ragrawal.wordpress.com/2013/02/24/on-writing-python-udf-for-pig-a-perspective/. This likely won't help you, but it will give you a better understanding of what Python UDFs can do.




回答2:


This may not answer most of your specific questions, but this blog post and linked code contains several good examples of using Pig with Python, and does include usage of Store/Load and their interaction with Python.



来源:https://stackoverflow.com/questions/10808838/python-udfs-in-pig

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!