I really want to take advantage of Python UDFs in Pig on our AWS Elastic MapReduce cluster, but I can\'t quite get things to work properly. No matter what I try, my pig job
Hmm...to clarify some of what I just read here, at this point using a python UDF in Pig running on EMR stored on s3, it's as simple as this line in your pig script:
REGISTER 's3://path/to/bucket/udfs.py' using jython as mynamespace
That is, no classpath modifications necessary. I'm using this in production right now, though with the caveat that I'm not pulling in any additional python modules in my udf. I think that may affect what you need to do to make it work.