My Spark's Worker cannot connect Master.Something wrong with Akka?

老子叫甜甜 提交于 2019-11-29 01:38:24

I'm not sure if this is the same issue I encountered but you may want to try setting SPARK_MASTER_IP the same as what spark binds to. In your example is looks like it would be 10.11.52.223 and not tc-52-223.

It should be the same as what you see when you visit the master node web UI on 8080. Something like: Spark Master at spark://ec2-XX-XX-XXX-XXX.compute-1.amazonaws.com:7077

AkhlD

If you are getting a "Connection refused" exception, You can resolve it by checking

=> Master is running on the specific host

netstat -at | grep 7077

You will get something similar to:

tcp        0      0 akhldz.master.io:7077 *:*             LISTEN  

If that is the case, then from your worker machine do a host akhldz.master.io ( replace akhldz.master.io with your master host.If something goes wrong, then add a host entry in your /etc/hosts file)
telnet akhldz.master.io 7077 ( If this is not connecting, then your worker wont connect either. )

=> Adding Host entry in /etc/hosts

Open /etc/hosts from your worker machine and add the following entry (example)

192.168.100.20   akhldz.master.io

PS :In the above case Pillis was having two ip addresses having same host name eg:

192.168.100.40  s1.machine.org
192.168.100.41  s1.machine.org

Hope that help.

JimLohse

There's a lot of answers and possible solutions, and this question is a bit old, but in the interest of completeness, there is a known Spark bug about hostnames resolving to IP addresses. I am not presenting this as the complete answer in all cases, but I suggest trying with a baseline of just using all IPs, and only use the single config SPARK_MASTER_IP. With just those two practices I get my clusters to work and all the other configs, or using hostnames, just seems to muck things up.

So in your spark-env.sh get rid of SPARK_WORKER_IP and change SPARK_MASTER_IP to an IP address, not a hostname.

I have treated this more at length in this answer.

For more completeness here's part of that answer:

Can you ping the box where the Spark master is running? Can you ping the worker from the master? More importantly, can you password-less ssh to the worker from the master box? Per 1.5.2 docs you need to be able to do that with a private key AND have the worker entered in the conf/slaves file. I copied the relevant paragraph at the end.

You can get a situation where the worker can contact the master but the master can't get back to the worker so it looks like no connection is being made. Check both directions.

I think the slaves file on the master node, and the password-less ssh can lead to similar errors to what you are seeing.

Per the answer I crosslinked, there's also an old bug but it's not clear how that bug was resolved.

set the port for spark worker also, Eg.: SPARK_WORKER_PORT=5078 ... check thespark-installation link for correct installation

basically your ports are blocked so communication from master to worker is cut down. check here https://spark.apache.org/docs/latest/configuration.html#networking

In the "Networking" section, you can see some of the ports are by default random. You can set them to your choice like below:

val conf = new SparkConf() 
    .setMaster(master) 
    .setAppName("namexxx") 
    .set("spark.driver.port", "51810") 
    .set("spark.fileserver.port", "51811") 
    .set("spark.broadcast.port", "51812") 
    .set("spark.replClassServer.port", "51813") 
    .set("spark.blockManager.port", "51814") 
    .set("spark.executor.port", "51815") 

I my case, I could overcome the problem as "adding entry of hostname and IP adres of localhost to /etc/hosts file" as follows:

For a cluster, master has the /etc/hosts content as follows:

127.0.0.1       master.yourhost.com localhost localhost4 localhost.localdomain
192.168.1.10    slave1.yourhost.com
192.168.1.9     master.yourhost.com **# this line solved the problem**

Then I also do the SAME THING on slave1.yourhost.com machine.

Hope this helps..

I had faced same issue . you can resolve it by below procedure , first you should go to /etc/hosts file and comment 127.0.1.1 address . then you should go towards spark/sbin directory , then you should started spark session by these command ,

./start-all.sh 

or you can use ./start-master.sh and ./start-slave.sh for the same . Now if you will run spark-shell or pyspark or any other spark component then it will automatically create spark context object sc for you .

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!