Pig: Hadoop jobs Fail

I have a pig script that queries data from a csv file.

The script has been tested locally with small and large .csv files.

In Small Cluster: It starts with processing the scripts, and fails after completing 40% of the call

The error is just, Failed to read data from "path to file"

What I infer is that, The script could read the file, but there is some connection drop, a message lose

But I get the above mentioned error only.

An answer for the General Problem would be changing the errors levels in the Configuration Files, adding these two lines to mapred-site.xml

log4j.logger.org.apache.hadoop = error,A 
log4j.logger.org.apache.pig= error,A

In my case, it aas an OutOfMemory Exception

Check your logs, increase the verbosity level if needed, but probably you're facing and Out of Mem error.

Check this answer on how to change Pig logging.

To change the memory in Hadoop change the hadoop-env.sh file as you can see documented here

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}"

For Apache PIG you have this in the header of pig bash file:

# PIG_HEAPSIZE The maximum amount of heap to use, in MB.
# Default is 1000.

So you can use export or set it in your .bashrc file

$ export PIG_HEAPSIZE=4096MB

来源：https://stackoverflow.com/questions/27524788/pig-hadoop-jobs-fail

标签

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!