Hadoop/MR temporary directory

邮差的信 提交于 2019-12-11 19:43:30

问题


I've been struggling with getting Hadoop and Map/Reduce to start using a separate temporary directory instead of the /tmp on my root directory.

I've added the following to my core-site.xml config file:

<property>
    <name>hadoop.tmp.dir</name>
    <value>/data/tmp</value>
</property>

I've added the following to my mapreduce-site.xml config file:

<property>
    <name>mapreduce.cluster.local.dir</name>
    <value>${hadoop.tmp.dir}/mapred/local</value>
</property>
<property>
    <name>mapreduce.jobtracker.system.dir</name>
    <value>${hadoop.tmp.dir}/mapred/system</value>
</property>
<property>
    <name>mapreduce.jobtracker.staging.root.dir</name>
    <value>${hadoop.tmp.dir}/mapred/staging</value>
</property>
<property>
   <name>mapreduce.cluster.temp.dir</name>
   <value>${hadoop.tmp.dir}/mapred/temp</value>
</property>

No matter what job I run though, it's still doing all of the intermediate work out in the /tmp directory. I've been watching it do it via df -h and when I go in there, there are all of the temporary files it creates.

Am I missing something from the config?

This is on a 10 node Linux CentOS cluster running 2.1.0.2.0.6.0 of Hadoop/Yarn Mapreduce.

EDIT: After some further research, the settings seem to be working on my management and namednode/secondarynamed nodes boxes. It is only on the data nodes that this is not working and it is only with the mapreduce temporary output files that are still going to /tmp on my root drive, not the my data mount where I have set in the configuration files.


回答1:


If you are running Hadoop 2.0, then the proper name of the config file you need to change is mapred-site.xml, not mapreduce-site.xml.

An example can be found on the Apache site: http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

and it uses the mapreduce.cluster.local.dir property name, with a default value of ${hadoop.tmp.dir}/mapred/local

Try renaming your mapreduce-site.xml file to mapred-site.xml in your /etc/hadoop/conf/ directories and see if that fixes it.

If you are using Ambari, you should be able to just go to use the "Add Property" button on the MapReduce2 / Custom mapred-site.xml section, enter 'mapreduce.cluster.local.dir' for the property name, and a comma separated list of directories you want to use.




回答2:


I think you need to specify this property in hdfs-site.xml rather than core-site.xml.Try setting this property in hdfs-site.xml. I hope this will solve your problem




回答3:


The mapreduce properties should be in mapred-site.xml.




回答4:


I was facing a similar issue where some nodes would not honor the hadoop.tmp.dir set in the config.

A reboot of the misbehaving nodes fixed it for me.



来源:https://stackoverflow.com/questions/20644725/hadoop-mr-temporary-directory

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!