Hadoop MapReduce iterate over input values of a reduce call

问题

I'm testing a simple mapreduce application, but I'm getting a little stuck trying to understand what happen when I iterate over input values of a reduce call.

This is the piece of code which behaves strangely..

public void reduce(Text key, Iterable<E> values, Context context)
    throws IOException, InterruptedException{

    Iterator<E> iterator = values.iterator();
    E first = (E)statesIter.next();

    while(statesIter.hasNext()){
        E state = statesIter.next();

        System.out.println(first.toString());
        // some other stuff
    }
    // some other stuff
}

so nothing strange.. except the fact that each println invocation actually prints a different string. So, every time I call the next() method, the object referenced by first changes.

So why this strange behavior?

回答1:

It's somewhat counter-intuitive, but it's actually documented in the API docs -- Hadoop reuses the keys / values, you should clone them if you want to keep them around.

来源：https://stackoverflow.com/questions/15976981/hadoop-mapreduce-iterate-over-input-values-of-a-reduce-call

标签

Hadoop

MapReduce

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!