hadoop: number of reducers remains a constant 4

≡放荡痞女 提交于 2019-12-11 13:22:39

问题


I'm running a hadoop job with mapred.reduce.tasks = 100 (just experimenting). The number of maps spawned are 537 as that depends on the input splits. Problem is the number of reducers "Running" in parallel won't go beyond 4. Even after the maps are 100% complete. Is there a way to increase the number of reducers running as the CPU usage is sub optimal and the Reduce is very slow.

I have also set mapred.tasktracker.reduce.tasks.maximum = 100. But this doesn't seem to affect the numbers of reducers running in parallel.


回答1:


Check the hashcodes that are used by the partitioner; if your keys only return 4 hashcode values, Hadoop will only schedule 4 reducers.

You might need to implement your own partitioner to get more reducers, however if your mappers produce only 4 keys, 4 is the maximum number of reducers.




回答2:


You can specify the number of reducers using job configuration like below:

job.setNumReduceTasks(6);

Also, when you are executing your jar, you can pass property like below:

-D mapred.reduce.tasks=6




回答3:


It turns out all that was required was a restart of the mapred and dfs daemons after you change the mapred-site.xml. mapred.tasktracker.reduce.tasks.maximum is indeed the right parameter to be set to increase the Reduce capacity.

Can't understand why hadoop chose not to reload the mapred-site every time when a job is submitted.



来源:https://stackoverflow.com/questions/13249522/hadoop-number-of-reducers-remains-a-constant-4

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!