Error 1121 importing external library in Pig UDF in Jython

ⅰ亾dé卋堺 提交于 2020-01-15 19:16:51

问题


I'm having a problem using the python library simplejson in jython to write a Pig UDF. I need because jython-standalone-2.5.2.jar doesn't come with a JSON library. I'm using Apache Pig version 0.11.0-cdh4.4.0 (rexported) compiled Sep 03 2013, 20:25:46, and according to the documentation http://pig.apache.org/docs/r0.11.1/udf.html#python-advanced "You can import Python modules in your Python script. Pig resolves Python dependencies recursively, which means Pig will automatically ship all dependent Python modules to the backend. Python modules should be found in the jython search path: JYTHON_HOME, JYTHON_PATH, or current directory.". So I download the library from https://pypi.python.org/pypi/simplejson/, unzip it in my working directory and then my script works in local mode (with -x local). Nevertheless in cluster mode I get this error in the failed logs of the task tracker:

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python Error. Traceback (most recent call last):
  File "ejercicio4-udfs.py", line 8, in <module>
ImportError: No module named simplejson

    at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:231)
    at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.init(JythonScriptEngine.java:158)
    at org.apache.pig.scripting.jython.JythonScriptEngine.getFunction(JythonScriptEngine.java:349)
    at org.apache.pig.scripting.jython.JythonFunction.<init>(JythonFunction.java:55)
    ... 92 more
Caused by: Traceback (most recent call last):
  File "ejercicio4-udfs.py", line 8, in <module>
ImportError: No module named simplejson

I've tried several things, like zipping simplejson and registering the zip and trying to access it with sys.path.append('simplejson.zip'), I've also tried with:

export JYTHONPATH=$JYTHONPATH:$(pwd)/simplejson.zip; pig script.pig

and also

pig -Dmapred.cache.files="simplejson.zip#simplejson.zip" -Dmapred.create.symlink=yes script.zip

回答1:


I don't know if my answer come too late but I managed to import simplejson in an UDF.

Here is how I did it :

I downloaded simplejson and put it into a lib folder, then in my UDF I did this :

import sys
sys.path.append('/path/to/your/lib/folder')
import simplejson as json

I then managed to do a json.loads() without any problem on my cluster.

Hope it helps



来源:https://stackoverflow.com/questions/21504026/error-1121-importing-external-library-in-pig-udf-in-jython

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!