I\'m writing a MapReduce job that may end up with a huge number of values in the reducer. I am concerned about all of these values being loaded into memory at once.
It's not entirely in memory, some of it comes from the disk, looking at the code seems like the framework breaks the Iterable into segments, and load them form disk into memory 1 by one.