Why is the right number of reduces in Hadoop 0.95 or 1.75?
问题 The hadoop documentation states: The right number of reduces seems to be 0.95 or 1.75 multiplied by ( * mapred.tasktracker.reduce.tasks.maximum). With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing. Are these values pretty constant? What are the results when you chose a value between these