I set up and configured a multi-node Hadoop cluster using this tutorial.
When I type in the start-all.sh command, it shows all the processes initializing properly as
Step 1:- Stop-all.sh
Step 2:- got to this path
cd /usr/local/hadoop/bin
Step 3:- Run that command hadoop datanode
Now DataNode work
Even after removing the remaking the directories, the datanode wasn't starting.
So, I started it manually using bin/hadoop datanode
It did not reach any conclusion. I opened another terminal from the same username and did jps
and it showed me the running datanode process.
It's working, but I just have to keep the unfinished terminal open by the side.
In case of Mac os(Pseudo-distributed mode):
Open terminal
cd /tmp
rm -rf hadoop*
bin/hdfs namenode -format
sbin/start-dfs.sh
Delete the datanode under your hadoop folder then rerun start-all.sh
Please control if the the tmp directory property is pointing to a valid directory in core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
If the directory is misconfigured, the datanode process will not start properly.
Need to follow 3 steps.
(1) Need to go to the logs and check the most recent log (In hadoop- 2.6.0/logs/hadoop-user-datanode-ubuntu.log)
If the error is as
java.io.IOException: Incompatible clusterIDs in /home/kutty/work/hadoop2data/dfs/data: namenode clusterID = CID-c41df580-e197-4db6-a02a-a62b71463089; datanode clusterID = CID-a5f4ba24-3a56-4125-9137-fa77c5bb07b1
i.e. namenode cluster id and datanode cluster id's are not identical.
(2) Now copy the namenode clusterID which is CID-c41df580-e197-4db6-a02a-a62b71463089 in above error
(3) Replace the Datanode cluster ID with Namenode cluster ID in hadoopdata/dfs/data/current/version
clusterID=CID-c41df580-e197-4db6-a02a-a62b71463089
Restart Hadoop. Will run DataNode