how to limit the number of mappers

前端 未结 4 814
半阙折子戏
半阙折子戏 2021-01-21 09:41

I explicitly specify the number of mappers within my java program using conf.setNumMapTasks(), but when the job ends, the counter shows that the number of launched

4条回答
  •  灰色年华
    2021-01-21 10:06

    According to [Partitioning your job into maps and reduces], follows:

    The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the InputFormat determines the number of maps.

    However, you can learn more about InputFormat .

提交回复
热议问题