Call mapper when reducer is done

那年仲夏 提交于 2020-01-06 14:45:33

问题


I am executing the job as:

hadoop/bin/./hadoop jar /home/hadoopuser/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar  -D mapred.reduce.tasks=2 -file kmeans_mapper.py    -mapper kmeans_mapper.py -file kmeans_reducer.py \
-reducer kmeans_reducer.py -input gutenberg/small_train.csv -output gutenberg/out

When the two reducers are done, I would like to do something with the results, so ideally I would like to call another file (another mapper?) which would receive the output of the reducers as its input. How to do that easily?

I checked this blog which has a Mrjob example, which doesn't explain, I do not get how to do mine.

The MapReduce tutorial states:

Users may need to chain MapReduce jobs to accomplish complex tasks which cannot be done via a single MapReduce job. This is fairly easy since the output of the job typically goes to distributed file-system, and the output, in turn, can be used as the input for the next job.

but it doesn't give any example...

Here is some code in Java I could understand, but I am writing Python! :/


This question sheds some light: Chaining multiple mapreduce tasks in Hadoop streaming


回答1:


It is possible to do what you're asking for using the Java API as you've found an example for.

But, you are using the streaming API which simply reads standard in and writes to standard out. There is no callback to say when a mapreduce job has completed other than the completion of the hadoop jar command. But, because it completed, doesn't really indicate a "success". That being said, it really isn't possible without some more tooling around the streaming API.

If the output was written to the local terminal rather than to HDFS, it might be possible to pipe that output into the input of another streaming job, but unfortunately, the inputs and outputs to the steaming jar require paths on HDFS.



来源:https://stackoverflow.com/questions/35249753/call-mapper-when-reducer-is-done

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!