Number of MapReduce tasks

孤街醉人 提交于 2019-12-07 21:40:16

问题


I need some help about how it is possible to get the correct number of Map and Reduce tasks in my application. Is there any way to discover this number?

Thanks


回答1:


It is not possible to get the actual number of map and reduce tasks for an application before its execution, since the factors of task failures followed by re-attempts and speculative execution attempts cannot be accurately determined prior to execution, an approximate number tasks can be derived.

The total number of Map tasks for a MapReduce job depends on its Input files and their FileFormat.
For each input file, splits are computed and one map task per input split will be invoked.

The split size will be calculated based on,

input_split_size = max(mapreduce.input.fileinputformat.split.minsize, min(mapreduce.input.fileinputformat.split.maxsize, dfs.blocksize))

If the properties

  • mapreduce.input.fileinputformat.split.minsize
  • mapreduce.input.fileinputformat.split.maxsize

    are at their default, the input split size for a file will be approximately equal to its blocksize considering the file is splittable.

The total number of map tasks will be equal to sum of number of input splits per file.
The total number of reduce tasks, it is 1 (default) or equal to mapreduce.job.reduces.




回答2:


The number of mappers depends on the file block size in HDFS (by default) and input split size (If we specify other than default).

If suppose you have 128MB file is there and hdfs block size is 64MB then a number of map task will be 2 because of default behaviour.

And if your input split size is 32MB but hdfs block size is 64MB then that time number of map task will be 4. So, map task depends on the all three factor defined above.

The number of reduce task depends on conf.seNumReduceTask(num) or mapreduce.job.reduces (mapred.reduce.tasks is deprecated).




回答3:


Number of map task is equal to the number of input splits in any job you can find any one of them to find the number of mapper and number of reducers you can set explicitly. Moreover, once you run the map reduce job you can observe generated logs to find out number of mappers and reducers in your job.



来源:https://stackoverflow.com/questions/42424642/number-of-mapreduce-tasks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!