Hadoop client and cluster separation

时光毁灭记忆、已成空白 提交于 2019-12-04 21:43:53

Users shouldn't be able to disrupt the functionality of the cluster. That's the meaning. Imagine there is a whole bunch of data scientists that launch their jobs from one of the cluster's masters. In case someone launches a memory-intensive operation, the master processes that are running on the same machine could end up with no memory and crash. That would leave the whole cluster in a failed state.

If you separate client node from master/slave nodes, users could still crash the client, but the cluster would stay up.

First of all.. this link has detailed information on how client communcates with namenode

http://www.informit.com/articles/article.aspx?p=2460260&seqNum=2

To my understanding, your professor wants to have a separate node as client from which you can run hadoop jobs but that node should not be part of the hadoop cluster.

Consider a scenario where you have to submit Hadoop job from client machine and client machine is not part of existing Hadoop cluster. It is expected that job to be get executed on Hadoop cluster.

Namenode and Datanode forms Hadoop Cluster, Client submits job to Namenode. To achieve this, Client should have same copy of Hadoop Distribution and configuration which is present at Namenode. Then Only Client will come to know on which node Job tracker is running, and IP of Namenode to access HDFS data.

Go through configuration on Namenode,

core-site.xml will have this property-

<property>
        <name>fs.default.name</name>
        <value>192.168.0.1:9000</value>
</property> 

mapred-site.xml will have this property-

<property>
      <name>mapred.job.tracker</name>
      <value>192.168.0.1:8021</value>
 </property>
These are two important properties must be copied to client machine’s Hadoop configuration. And you need to set one addtinal property in mapred-site.xml file, to overcome from Privileged Action Exception.

<property>
      <name>mapreduce.jobtracker.staging.root.dir</name>
      <value>/user</value>
</property>
Also you need to update /ets/hosts of client machine with IP addresses and hostnames of namenode and datanode.

Now you can submit job from client machine with hadoop jar command, and job will be executed on Hadoop Cluster. Note that, you shouldn’t start any hadoop service on client machine.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!