New user SSH hadoop

ε祈祈猫儿з 提交于 2019-12-06 16:24:21

问题


Installation of hadoop on single node cluster , any idea why do we need to create the following

  1. Why do we need SSH access for a new user ..?

  2. Why should it be able to connect to its own user account?

  3. Why should i specify a password less for a new user ..?

  4. When all the nodes are in same machine, why do they are communicating explicitly ..?

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/


回答1:


Why do we need SSH access for a new user ..?

Because you want to communicate to the user who is running Hadoop daemons. Notice that ssh is actually from a user(on one machine) to another user(on a another machine), and not just machine to machine.

Why should it be able to connect to its own user account?

Because you want to start all the daemons by just one command. Otherwise you have to start the daemons individually, by issuing commands for each daemon. ssh is required for this, even if you are on a single machine.

Why should i specify a password less for a new user ..?

Because you don't want to enter the password everytime you start your Hadoop daemons. That would be irritating, right?

When all the nodes are in same machine, why do they are communicating explicitly ..?

What do you mean by explicitly? Remember, ssh is not for the communication between the processes. All the communication happens over TCP/IP. ssh is required by the Hadoop scripts so that you can start all the daemons from one machines without having to go at each machine and start each process separately over there.

HTH




回答2:


It's not mandatory that your setup password-less ssh among nodes or local machine. Hadoop mainly uses http for data transfers across nodes when required.

password-less ssh access are required (among nodes) so that your start-all.sh, start-dfs.sh and start-mapred.sh scripts (as far as can I remember), can be used to start stop the Hadoop daemons in a distributed cluster environment. Otherwise, it can go cumbersome, to go into every machine and start/stop the Hadoop daemons.

You can also use, hadoop-daemons.sh or hadoop-daemon.sh to accomplish the same things logging into as your hadoop user.

Cloudera Hadoop Distribution don't even uses those script and provides init.d scripts to do the starting/stopping of Hadoop daemons.




回答3:


slaves.sh is used to start the remote nodes:

for slave in `cat "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do
 ssh $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \
   2>&1 | sed "s/^/$slave: /" &
 if [ "$HADOOP_SLAVE_SLEEP" != "" ]; then
   sleep $HADOOP_SLAVE_SLEEP
 fi
done

It has a dependency on ssh, as you see. While you can do the entire tutorial w/o requiring a new user and ssh config, I would guess that, as a tutorial, it would not give you a good start for when you have to deploy/configure/start/stop a real cluster (ie. remote nodes). As @JteRocker points out distributions like Cloudera use other scripts to start/stop daemons (but I would guess they still depend on ssh), and a distribution like Hortonworks' Hadoop on Windows would use yet another mechanisms (ie. PowerShell and WinRM instead of ssh).




回答4:


Use this commnad

$Sudo addgroup hadoop if not working then

$sudo adduser --ingroup hadoop hduser



来源:https://stackoverflow.com/questions/17805431/new-user-ssh-hadoop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!