问题
I have two hadoop instances running inside two lxc containers on the same host, a hadoop-master and a hadoop-slave1. While starting YARN & DFS on master I get this UNHEALTHY state for hadoop-slave1.
For what I've found on the web it must be one of these two possibilities:
- Not enough disk space.
- Permission issue
a. df -h says otherwise :
Filesystem Size Used Avail Use% Mounted on
/dev/sda5 91G 68G 19G 79% /
none 4,0K 0 4,0K 0% /sys/fs/cgroup
udev 3,8G 4,0K 3,8G 1% /dev
tmpfs 769M 1,3M 768M 1% /run
none 5,0M 0 5,0M 0% /run/lock
none 3,8G 536K 3,8G 1% /run/shm
none 100M 48K 100M 1% /run/user
b.
ll /usr/local/hadoop :
............
drwxr-xr-x 2 hduser hadoop 4096 Mar 8 19:10 local/
ll /usr/local/hadoop/logs :
total 132
drwxr-xr-x 3 hduser hadoop 4096 Mar 8 18:55 ./
drwxr-xr-x 12 hduser hadoop 4096 Mar 8 18:54 ../
-rw-r--r-- 1 hduser hadoop 46222 Mar 8 18:55 hadoop-hduser-datanode-hadoop-slave1.log
-rw-r--r-- 1 hduser hadoop 718 Mar 8 18:55 hadoop-hduser-datanode-hadoop-slave1.out
-rw-r--r-- 1 hduser hadoop 0 Mar 8 17:08 SecurityAuth-hduser.audit
drwxr-xr-x 2 hduser hadoop 4096 Mar 8 19:10 userlogs/
-rw-r--r-- 1 hduser hadoop 56645 Mar 8 19:08 yarn-hduser-nodemanager-hadoop-slave1.log
-rw-r--r-- 1 hduser hadoop 702 Mar 8 18:56 yarn-hduser-nodemanager-hadoop-slave1.out
my yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value> org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:8050</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///usr/local/hadoop/local</value>
</property>
</configuration>
And the error that ResourceManager complains about:
INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node hadoop-slave1:48673 reported UNHEALTHY with details: 1/1 local-dirs are bad: /usr/local/hadoop/local; 1/1 log-dirs are bad: /usr/local/hadoop/logs/userlogs
They say here that I need to make some disk space, isn't there the same disk space for the lxc-container as there is for the host OS ? Any ideas ? Thanks
Edit:
I have completely overlooked this in the NodeManager error log:
2015-03-10 08:15:17,671 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /usr/local/hadoop/local error, Directory is not executable: /usr
/local/hadoop/local, removing from list of valid directories
2015-03-10 08:15:17,671 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /usr/local/hadoop/logs/userlogs error, Directory is not executab
le: /usr/local/hadoop/logs/userlogs, removing from list of valid directories
2015-03-10 08:15:17,671 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed: 1/1 local-dirs are bad: /usr/local/hadoop/local; 1/1 l
og-dirs are bad: /usr/local/hadoop/logs/userlogs
2015-03-10 08:15:17,671 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs are bad: /usr/local/hadoop/l
ocal; 1/1 log-dirs are bad: /usr/local/hadoop/logs/userlogs
来源:https://stackoverflow.com/questions/28949872/hadoop-in-lxc-container-error-yarn-1-1-local-dirs-are-bad