Hadoop Reducer Values in Memory?

前端未结

关注

 3  2087

故里飘歌 2020-12-30 11:31

I\'m writing a MapReduce job that may end up with a huge number of values in the reducer. I am concerned about all of these values being loaded into memory at once.

3条回答

星月不相逢 (楼主)

2020-12-30 12:03
As quoted by other users, entire data was not loaded into memory. Have a look at some of mapred-site.xml parameters from Apache documentation link.
```
mapreduce.reduce.merge.inmem.threshold
```
Default value: 1000. It is the threshold, in terms of the number of files for the in-memory merge process.
```
mapreduce.reduce.shuffle.merge.percent
```
Default value is 0.66. The usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapreduce.reduce.shuffle.input.buffer.percent.
```
mapreduce.reduce.shuffle.input.buffer.percent
```
Default value is 0.70. The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle.
```
mapreduce.reduce.input.buffer.percent
```
Default value is 0. The percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin.
```
mapreduce.reduce.shuffle.memory.limit.percent
```
Default value is : 0.25. Maximum percentage of the in-memory limit that a single shuffle can consume
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...