Apache Zeppelin & Spark Streaming: Twitter Example only works local

问题

I just added the example project to my Zeppelin Notebook from http://zeppelin-project.org/docs/tutorial/tutorial.html (section "Tutorial with Streaming Data"). The problem I now have is that the application seems only to work local. If I change the Spark interpreter setting "master" from "local[*]" to "spark://master:7077" the application won't bring any result anymore when I'm doing the same SQL statement. Am I doing anything wrong? I already restarted the Zeppelin interpreter, also the whole Zeppelin daemon and the Spark cluster, nothing solved the issue! Can someone help.

I use the following installation:

Spark 1.5.1 (prebuild for Hadoop 2.6+), Master + 2x Slaves
Zeppelin 0.5.5 (installed on Spark's master node)

EDIT Also the following installation won't work for me:

Spark 1.5.0 (prebuild for Hadoop 2.6+), Master + 2x Slaves
Zeppelin 0.5.5 (installed on Spark's master node)

Screenshot: local setting (works!)

Screenshot: cluster setting (won't work!)

The job seems to run correctly in cluster mode:

回答1:

I got it after 2 days of trying around!

The difference between the local Zeppelin Spark interpreter and the Spark Cluster seems to be, that the local one has included the Twitter Utils which are needed for executing the Twitter Streaming example, and the Spark Cluster doesn't have this library by default.

Therefore you have to add the dependency manually in the Zeppelin Notebook before starting the application with Spark cluster as master. So the first paragraph of the Notebook must be:

%dep
z.reset
z.load("org.apache.spark:spark-streaming-twitter_2.10:1.5.1")

If an error occures on running this paragraph, just try to restart the Zeppelin server via ./bin/zeppelin-daemon.sh stop (& start)!

来源：https://stackoverflow.com/questions/34296894/apache-zeppelin-spark-streaming-twitter-example-only-works-local

标签

apache-spark

apache-spark-sql

spark-streaming

apache-zeppelin