hadoop: number of reducers remains a constant 4

问题

I'm running a hadoop job with mapred.reduce.tasks = 100 (just experimenting). The number of maps spawned are 537 as that depends on the input splits. Problem is the number of reducers "Running" in parallel won't go beyond 4. Even after the maps are 100% complete. Is there a way to increase the number of reducers running as the CPU usage is sub optimal and the Reduce is very slow.

I have also set mapred.tasktracker.reduce.tasks.maximum = 100. But this doesn't seem to affect the numbers of reducers running in parallel.

回答1:

Check the hashcodes that are used by the partitioner; if your keys only return 4 hashcode values, Hadoop will only schedule 4 reducers.

You might need to implement your own partitioner to get more reducers, however if your mappers produce only 4 keys, 4 is the maximum number of reducers.

回答2:

You can specify the number of reducers using job configuration like below:

job.setNumReduceTasks(6);

Also, when you are executing your jar, you can pass property like below:

-D mapred.reduce.tasks=6

回答3:

It turns out all that was required was a restart of the mapred and dfs daemons after you change the mapred-site.xml. mapred.tasktracker.reduce.tasks.maximum is indeed the right parameter to be set to increase the Reduce capacity.

Can't understand why hadoop chose not to reload the mapred-site every time when a job is submitted.

来源：https://stackoverflow.com/questions/13249522/hadoop-number-of-reducers-remains-a-constant-4

标签

java

Hadoop

MapReduce

distributed-computing