Hadoop Reducer Values in Memory?
问题 I'm writing a MapReduce job that may end up with a huge number of values in the reducer. I am concerned about all of these values being loaded into memory at once. Does the underlying implementation of the Iterable<VALUEIN> values load values into memory as they are needed? Hadoop: The Definitive Guide seems to suggest this is the case, but doesn't give a "definitive" answer. The reducer output will be far more massive than the values input, but I believe the output is written to disk as