How to sync Hadoop configuration files to multiple nodes?

北城以北 提交于 2020-01-03 02:54:51

问题


I uesd to manage a cluster of only 3 Centos machines running Hadoop. So scp is enough for me to copy the configuration files to the other 2 machines.

However, I have to setup a Hadoop cluster to more than 10 machines. It is really frustrated to sync the files so many times using scp.

I want to find a tool that I can easily sync the files to all machines. And the machine names are defined in a config file, such as:

node1
node2
...
node10

Thanks.


回答1:


If you do not want to use Zookeeper you can modify your hadoop script in $HADOOP_HOME/bin/hadoop and add something like :

if [ "$COMMAND" == "deployConf" ]; then
  for HOST in `cat $HADOOP_HOME/conf/slaves`
    do
       scp $HADOOP_HOME/conf/mapred-site.xml $HOST:$HADOOP_HOME/conf
       scp $HADOOP_HOME/conf/core-site.xml $HOST:$HADOOP_HOME/conf
       scp $HADOOP_HOME/conf/hdfs-site.xml $HOST:$HADOOP_HOME/conf
    done
    exit 0
fi

That's what I'm using now and it does the job.




回答2:


Use Zookeeper with Hadoop.

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Reference: http://wiki.apache.org/hadoop/ZooKeeper




回答3:


You have several options to do that. One way is to use tools like rsync. The Hadoop control scripts can distribute configuration files to all nodes of the cluster using rsync. Alternatively, you can make use of tools like Cloudera Manager or Ambari if you need a more sophisticated way to achieve that.




回答4:


If you use InfoSphere BigInsights then there is the script syncconf.sh



来源:https://stackoverflow.com/questions/18399863/how-to-sync-hadoop-configuration-files-to-multiple-nodes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!