Hadoop cluster - Do I need to replicate my code over all machines before running job?

限于喜欢 提交于 2019-11-29 15:16:06

With Hadoop Streaming, the code/dependencies have to be copied with the -file flag, if the code is not there on the target machine. Make sure that the map/reduce files and their dependencies are specified in the Hadoop streaming command.

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input myInputDirs \
    -output myOutputDir \
    -mapper myPythonScript.py \
    -reducer /bin/wc \
    -file myPythonScript.py \
    -file myDictionary.txt \
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!