I explicitly specify the number of mappers within my java program using conf.setNumMapTasks()
, but when the job ends, the counter shows that the number of launched
According to [Partitioning your job into maps and reduces], follows:
The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the InputFormat determines the number of maps.
However, you can learn more about InputFormat .