Map Job Performance on cluster
问题 Suppose I have 15 blocks of data and two clusters. The first cluster has 5 nodes and a replication factor is 1, while the second one has a replication factor is 3. If I run my map job, should I expect any change in the performance or the execution time of the map job? In other words, how does replication affect the performance of the mapper on a cluster? 回答1: When the JobTracker assigns a job to a TaskTracker on HDFS, a job is assigned to a particular node based upon locality of data