How do you use Python UDFs with Pig in Elastic MapReduce?

前端 未结 4 496
深忆病人
深忆病人 2020-12-11 08:20

I really want to take advantage of Python UDFs in Pig on our AWS Elastic MapReduce cluster, but I can\'t quite get things to work properly. No matter what I try, my pig job

4条回答
  •  星月不相逢
    2020-12-11 08:41

    Hmm...to clarify some of what I just read here, at this point using a python UDF in Pig running on EMR stored on s3, it's as simple as this line in your pig script:

    REGISTER 's3://path/to/bucket/udfs.py' using jython as mynamespace

    That is, no classpath modifications necessary. I'm using this in production right now, though with the caveat that I'm not pulling in any additional python modules in my udf. I think that may affect what you need to do to make it work.

提交回复
热议问题