问题
I've been struggling with getting Hadoop and Map/Reduce to start using a separate temporary directory instead of the /tmp on my root directory.
I've added the following to my core-site.xml config file:
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
I've added the following to my mapreduce-site.xml config file:
<property>
<name>mapreduce.cluster.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>${hadoop.tmp.dir}/mapred/system</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>${hadoop.tmp.dir}/mapred/staging</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>${hadoop.tmp.dir}/mapred/temp</value>
</property>
No matter what job I run though, it's still doing all of the intermediate work out in the /tmp directory. I've been watching it do it via df -h and when I go in there, there are all of the temporary files it creates.
Am I missing something from the config?
This is on a 10 node Linux CentOS cluster running 2.1.0.2.0.6.0 of Hadoop/Yarn Mapreduce.
EDIT: After some further research, the settings seem to be working on my management and namednode/secondarynamed nodes boxes. It is only on the data nodes that this is not working and it is only with the mapreduce temporary output files that are still going to /tmp on my root drive, not the my data mount where I have set in the configuration files.
回答1:
If you are running Hadoop 2.0, then the proper name of the config file you need to change is mapred-site.xml
, not mapreduce-site.xml
.
An example can be found on the Apache site: http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
and it uses the mapreduce.cluster.local.dir
property name, with a default value of ${hadoop.tmp.dir}/mapred/local
Try renaming your mapreduce-site.xml
file to mapred-site.xml
in your /etc/hadoop/conf/
directories and see if that fixes it.
If you are using Ambari, you should be able to just go to use the "Add Property" button on the MapReduce2 / Custom mapred-site.xml section, enter 'mapreduce.cluster.local.dir' for the property name, and a comma separated list of directories you want to use.
回答2:
I think you need to specify this property in hdfs-site.xml rather than core-site.xml.Try setting this property in hdfs-site.xml. I hope this will solve your problem
回答3:
The mapreduce properties should be in mapred-site.xml.
回答4:
I was facing a similar issue where some nodes would not honor the hadoop.tmp.dir set in the config.
A reboot of the misbehaving nodes fixed it for me.
来源:https://stackoverflow.com/questions/20644725/hadoop-mr-temporary-directory