Hello world in zeppelin failed

问题

I just installed apache zeppelin (built from latest source from git repo) and successfully saw it is up and running in the port 10008. I created a new note book with a single line of code

val a = "Hello World!"

And run this paragraph and saw the below error

java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:137) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:257) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:197) at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:304) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745

Any clue?

My backend is spark 1.5 and I verified by web interface of interpreter that zeppelin points to right version of spark and approproate spark.home.

回答1:

The error may be casued also du to an error occurred while Zeppelin trying to create the interpreter.

Zeppelin starts the interpretter in a different process and tries to connect to using Thrift Protocol

In my case I ve this error when trying to assign 5GB for spark driver in spark-defaults.conf It is resolved when commenting this line (or assign 4g or less)

#spark.driver.memory              5g

You could have a look at this JIRA ZEPPELIN-305

EDIT:

This error could be caused by any reason that prevents the Spark interpreter process from starting. Recently, I ve got it when trying to add the JMX options to ZEPPELIN_JAVA_OPTS, that cause the interpreter process to use the same JMX port as the Zeppelin process. Giving the "Port Already in Use" error

Please check the Zeppelin logs (by default they are in ZEPPELIN_DIR/logs/ to see what is happening when Zeppelin trying to start Spark Interpreter

回答2:

I had this issue when $SPARK_HOME was not set correctly

回答3:

An error stack like [1] below could mean a lot of different things. Zeppelin Server could not connect to a local interpreter, because it did not start or died. It seems a Zeppelin bug as it can't catch when interpreter.sh exits without creating a Zeppelin interpreter process, submitted https://issues.apache.org/jira/browse/ZEPPELIN-1984 to track that.

In all our cases with different root causes, real error was only revealable if you would add

LOG="/tmp/interpreter.sh-$$.log"
date >> $LOG
set -x
exec >> $LOG
exec 2>&1

to $ZEPPELIN_HOME/bin/interpreter.sh so then a /tmp/interpreter.sh-*.log file would show you actual problem.

[1]

ERROR [2017-01-18 16:54:38,533] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:1645) - Error org.apache.zeppelin.interpreter.InterpreterException: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:232) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:400) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:316) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262)

Edit. Another way to reveal true root cause is to change log4j to see the output of spark interpreter process, as hinted by Jeff in ZEPPELIN-1984. Change your ZEPPELIN_HOME/conf/log4j.properies as following:

log4j.rootLogger = INFO, dailyfile

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n

log4j.appender.dailyfile.DatePattern=.yyyy-MM-dd
log4j.appender.dailyfile.Threshold = DEBUG
log4j.appender.dailyfile = org.apache.log4j.DailyRollingFileAppender
log4j.appender.dailyfile.File = ${zeppelin.log.file}
log4j.appender.dailyfile.layout = org.apache.log4j.PatternLayout
log4j.appender.dailyfile.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n

log4j.logger.org.apache.zeppelin.interpreter.InterpreterFactory=DEBUG
log4j.logger.org.apache.zeppelin.notebook.Paragraph=DEBUG
log4j.logger.org.apache.zeppelin.scheduler=DEBUG
log4j.logger.org.apache.zeppelin.livy=DEBUG
log4j.logger.org.apache.zeppelin.flink=DEBUG
log4j.logger.org.apache.zeppelin.spark=DEBUG
log4j.logger.org.apache.zeppelin.python=DEBUG
log4j.logger.org.apache.zeppelin.interpreter.util=DEBUG
log4j.logger.org.apache.zeppelin.interpreter.remote=DEBUG
log4j.logger.org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer=DEBUG

and restart Zeppelin. Note: it may produce excessive logging. My original advise of adding a few lines to interpreter.sh doesn't require restarting Zeppelin.

Also created pull request to (partially) fix this issue: https://github.com/apache/zeppelin/pull/1921

Update 1/24/2017. https://issues.apache.org/jira/browse/ZEPPELIN-1984 is fixed in master and will be invluded in Zeppelin 0.8 release. Two important fixes are part of ZEPPELIN-1984:

you wouldn't get "connection refused" whan an interpter process can't start;
Zeppelin would show root cause (in a paragraph output) what is the root cause.

回答4:

I noticed that the url that points to spark was not correct. Once, I corrected it, it works fine. Thanks anyway.

回答5:

Had same issue when $YARN_QUEUE was set incorrectly

回答6:

This question has been open for a year now, not sure if the solution to the problem was realized. Recently, I bumped into a similar error using Yarn-Spark on Amazon EMR. As I debugged it, I realized the following, and would suggest people to try if they find themselves in similar shoes(solution is based on EMR, but should be similar on other offerings)

1. kill -9 `ps -ef | grep zeppelin | grep -v grep | awk '{print $2}'`( *will make sure zombie processes are taken care of*)
2. kill -9 `ps -ef | grep hadoop-yarn-resourcemanager | grep -v grep | awk '{print $2}'`
3. sudo /sbin/restart hadoop-yarn-resourcemanager
4. At times, simply starting the resource-manager does not start the name-node `sudo start hadoop-hdfs-namenode`
5. sudo /usr/lib/zeppelin/bin/zeppelin-daemon.sh start 
6. Use telnet to make sure that the default ports are open for required service.

At the endo the same, one should be able to get zeppelin running properly with a valid SparkContext. Hope this was useful

回答7:

In my case, (project-root)/node_modules/zeppelin/spark-2.0.2-bin-hadoop2.7 was not installed, for some unknown reason. rm -rf node_modules; npm cache clear; npm i fixed it.

回答8:

I fixed this bug with change the spark-modle yarn-cluster to yarn-client as it seted in zepplin/conf/defalt.sh

回答9:

I got exactly the same error when tried to run Zeppelin with Spark in same docker container on micro instance in Amazon ECS.

The error source is visible in output log in %ZEPPELIN_HOME%/logs/*.out and it was saying that Zeppelin failed to start Spark interpreter due low memory. So I moved my Docker image to the instance with more memory.

回答10:

In my case, I have three node in my cluster. Although in three of them spark was installed, zeppelin was installed on only one of them.

So In zeppelin Interpreter Menu --> Spark --> Edit --> Properties --> Master

changing that parameter from yarn-client to local[*] fixed my problem.

来源：https://stackoverflow.com/questions/32735645/hello-world-in-zeppelin-failed

标签

apache-spark

apache-zeppelin