Where Mapper output in Hadoop is saved?

拥有回忆 提交于 2019-12-09 21:18:21

问题


I am interested in efficiently manage the Hadoop shuffling traffic and utilize the network bandwidth effectively. To do this I want to know how much shuffling traffic generated by each Datanodes ? Shuffling traffic is nothing but the output of mappers. So where this mapper output is saved ? How can i get the size of mapper output from each datanodes in a real time ? Appreciate your help.

I have created a directory to store this mapper output as below.

 <property>
 <name>mapred.local.dir</name>
 <value>/app/hadoop/tmp/myoutput</value>
 </property>                     

and i looked at

 hduser@dn4:/app/hadoop/tmp/myoutput$ ls -lrt
 total 16
 drwxr-xr-x 2 hduser hadoop 4096 Dec 12 10:50 tt_log_tmp
 drwx------ 3 hduser hadoop 4096 Dec 12 10:53 ttprivate
 drwxr-xr-x 3 hduser hadoop 4096 Dec 12 10:53 taskTracker
 drwxr-xr-x 4 hduser hadoop 4096 Dec 12 13:25 userlogs  

and i couldnot find anything here when i run the mapreduce job .

Thanks


回答1:


The output of the maps jobs is stored in the local disk of the mappers. Once the map job finishes these local outputs are then transferred to reducers. You can check your $HADOOP_HOME/conf/mapred-site.xml to check where mapper outputs are stored.

<property>
    <name>mapred.local.dir</name>
    <value>$DIR</value>
</property>


来源:https://stackoverflow.com/questions/27437964/where-mapper-output-in-hadoop-is-saved

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!