问题
How can I set the number of map task with a org.apache.hadoop.mapreduce.Job ? The function does not seem to exist... But it exists for the org.apacache.hadoop.mapred.JobConf...
Thanks !
回答1:
AFAIK, setNumMapTasks is not supported any more.
It is merely a hint to the framework(even in the old API), and doesn't guarantee that you'll get only the specified number of maps. The map creation is actually governed by the InputFormat you are using in your job.
You could tweak the following properties as per your needs :
mapred.min.split.size
mapred.max.split.size
Since you are dealing with small data, setting mapred.max.split.size to a lower value should do the trick. You could use setMaxInputSplitSize(Job, long) inside your job to alter this. The long argument is the size of the split in bytes, which you can set to your desired value.
Also, set the HDFS block size to a smaller value for small data using dfs.block.size.
来源:https://stackoverflow.com/questions/19113819/setnummaptask-with-a-mapreduce-job