问题
I uesd to manage a cluster of only 3 Centos machines running Hadoop. So scp
is enough for me to copy the configuration files to the other 2 machines.
However, I have to setup a Hadoop cluster to more than 10 machines. It is really frustrated to sync the files so many times using scp
.
I want to find a tool that I can easily sync the files to all machines. And the machine names are defined in a config file, such as:
node1
node2
...
node10
Thanks.
回答1:
If you do not want to use Zookeeper you can modify your hadoop script in $HADOOP_HOME/bin/hadoop
and add something like :
if [ "$COMMAND" == "deployConf" ]; then
for HOST in `cat $HADOOP_HOME/conf/slaves`
do
scp $HADOOP_HOME/conf/mapred-site.xml $HOST:$HADOOP_HOME/conf
scp $HADOOP_HOME/conf/core-site.xml $HOST:$HADOOP_HOME/conf
scp $HADOOP_HOME/conf/hdfs-site.xml $HOST:$HADOOP_HOME/conf
done
exit 0
fi
That's what I'm using now and it does the job.
回答2:
Use Zookeeper with Hadoop.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Reference: http://wiki.apache.org/hadoop/ZooKeeper
回答3:
You have several options to do that. One way is to use tools like rsync. The Hadoop control scripts can distribute configuration files to all nodes of the cluster using rsync. Alternatively, you can make use of tools like Cloudera Manager or Ambari if you need a more sophisticated way to achieve that.
回答4:
If you use InfoSphere BigInsights then there is the script syncconf.sh
来源:https://stackoverflow.com/questions/18399863/how-to-sync-hadoop-configuration-files-to-multiple-nodes