问题
During the shuffle stage of Hadoop data the mapped data is transferred across nodes of the clusters according to the partitions for the reducer. What protocol does Hadoop use for performing the shuffle of data across nodes for the reduce stage?
回答1:
I really laughed for the first time, but the whole shuffeling and merging is done by a HTTPServlet
.
You can see this in the Tasktrackers
sourcecode in the anonymous class MapOutputServlet
It gets a HTTP request with IDs of the tasks and jobs and then it is going to transfer the incoming input stream into the local filesystem on disk.
来源:https://stackoverflow.com/questions/8285217/hadoop-shuffle-uses-which-protocol