Spark Shell Listens on localhost instead of configured IP address

匿名 (未验证) 提交于 2019-12-03 09:14:57

问题:

I am trying to run a simple spark job via spark-shell and it looks like BlockManager for the spark-shell listens on localhost instead of configured IP address which causes the spark job to fail. The exception thrown is "Failed to connect to localhost" .

Here is the my configuration:

Machine 1(ubunt64): Spark master [192.168.253.136]

Machine 2(ubuntu64server): Spark Slave [192.168.253.137]

Machine 3(ubuntu64server2): Spark Shell Client[192.168.253.138]

Spark Version: spark-1.3.0-bin-hadoop2.4 Environment: Ubuntu 14.04

Source Code to be executed in Spark Shell:

    import org.apache.spark.SparkConf     import org.apache.spark.SparkContext      var conf = new SparkConf().setMaster("spark://192.168.253.136:7077")     conf.set("spark.driver.host","192.168.253.138")     conf.set("spark.local.ip","192.168.253.138")     sc.stop     var sc = new SparkContext(conf)     val textFile = sc.textFile("README.md")     textFile.count() 

The above code just works file if I run it on Machine 2 where the slave is running, but it fails on Machine 1 (Master) and Machine 3(Spark Shell).

Not sure why spark shell listens on a localhost instead of configured IP address. I have set SPARK_LOCAL_IP on Machine 3 using spark-env.sh as well in .bashrc (export SPARK_LOCAL_IP=192.168.253.138). I confirmed that spark shell java program does listen on the port 44015. Not sure why spark shell is broadcasting localhost address.

Any help to troubleshoot this issue will be highly appreciated. Probably I am missing some configuration setting.

Logs:

scala> val textFile = sc.textFile("README.md")

15/04/22 18:15:22 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975

15/04/22 18:15:22 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)

15/04/22 18:15:22 INFO MemoryStore: ensureFreeSpace(22692) called with curMem=163705, maxMem=280248975

15/04/22 18:15:22 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 267.1 MB)

15/04/22 18:15:22 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:44015 (size: 22.2 KB, free: 267.2 MB)

scala> textFile.count()

15/04/22 18:16:07 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (README.md MapPartitionsRDD[1] at textFile at :25)

15/04/22 18:16:07 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks

15/04/22 18:16:08 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ubuntu64server, PROCESS_LOCAL, 1326 bytes)

15/04/22 18:16:23 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ubuntu64server, PROCESS_LOCAL, 1326 bytes)

15/04/22 18:16:23 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ubuntu64server): java.io.IOException: Failed to connect to localhost/127.0.0.1:44015 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

回答1:

Found a work-around for this BlockManager localhost issue by providing spark master address at shell initiation (or can bein spark-defaults.conf).

./spark-shell --master spark://192.168.253.136:7077  

This way, I didn't have to stop the spark context and the original context was able to read files as well as read data from Cassandra tables.

Here is the log of BlockManager listening on localhost (stop and dynamic creation of context) which fails with "Failed to connect exception"

15/04/25 07:10:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:40235 (size: 1966.0 B, free: 267.2 MB) 

compare to listening on actual server name (if spark master provided at command line) which works

15/04/25 07:12:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ubuntu64server2:33301 (size: 1966.0 B, free: 267.2 MB) 

Looks like a bug in BlockManager code when context is dynamically created in the shell.

Hope this helps someone.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!