SetNumMapTask with a mapreduce.Job

巧了我就是萌 提交于 2020-01-04 01:58:20

问题


How can I set the number of map task with a org.apache.hadoop.mapreduce.Job ? The function does not seem to exist... But it exists for the org.apacache.hadoop.mapred.JobConf...

Thanks !


回答1:


AFAIK, setNumMapTasks is not supported any more.

It is merely a hint to the framework(even in the old API), and doesn't guarantee that you'll get only the specified number of maps. The map creation is actually governed by the InputFormat you are using in your job.

You could tweak the following properties as per your needs :

  • mapred.min.split.size

  • mapred.max.split.size

Since you are dealing with small data, setting mapred.max.split.size to a lower value should do the trick. You could use setMaxInputSplitSize(Job, long) inside your job to alter this. The long argument is the size of the split in bytes, which you can set to your desired value.

Also, set the HDFS block size to a smaller value for small data using dfs.block.size.



来源:https://stackoverflow.com/questions/19113819/setnummaptask-with-a-mapreduce-job

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!