Hadoop global variable with streaming

牧云@^-^@ 提交于 2019-12-11 02:58:47

问题


I understand that i can give some global value to my mappers via the Job and the Configuration.

But how can i do that using Hadoop Streaming(Python in my case)?

What is the right way?


回答1:


Based on the docs you can specify a command line option (-cmdenv name=value) to set environment variables on each distributed machine that you can then use in your mappers/reducers:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input input.txt \
    -output output.txt \
    -mapper mapper.py \
    -reducer reducer.py \
    -file mapper.py \
    -file reducer.py \
    -cmdenv MY_PARAM=thing_I_need


来源:https://stackoverflow.com/questions/31833045/hadoop-global-variable-with-streaming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!