How to use Zeppelin to access aws spark-ec2 cluster and s3 buckets

问题

I have an aws ec2 cluster setup by the spark-ec2 script.

I would like to configure Zeppelin so that I can write scala code locally on Zeppelin and run it on the cluster (via master). Furthermore I would like to be able to access my s3 buckets.

I followed this guide and this other one however I can not seem to run scala code from zeppelin to my cluster.

I installed Zeppelin locally with

mvn install -DskipTests -Dspark.version=1.4.1 -Dhadoop.version=2.7.1

My security groups were set to both AmazonEC2FullAccess and AmazonS3FullAccess.

I edited the spark interpreter properties on the Zeppelin Webapp to spark://.us-west-2.compute.amazonaws.com:7077 from local[*]

When I test out

sc

in the interpreter, I recieve this error

java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at

When I try to edit "conf/zeppelin-site.xml" to change my port to 8082, no difference.

NOTE: I eventually would also want to access my s3 buckets with something like:

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
val file = "s3n://<<bucket>>/<<file>>"
val data = sc.textFile(file)
data.first

if any benevolent users have any advice (that wasn't already posted on StackOverflow) please let me know!

回答1:

Most likely your IP address is blocked from connecting to your spark cluster. You can try by launching the spark-shell pointing at that end point (or even just telnetting). To fix it you can log into your AWS account and change the firewall settings. Its also possible that it isn't pointed at the correct host (I'm assuming you removed the specific box from spark://.us-west-2.compute.amazonaws.com:7077 but if not there should be a bit for the .us-west-2). You can try ssh'ing to that machine and running netstat --tcp -l -n to see if its listening (or even just ps aux |grep java to see if Spark is running).

来源：https://stackoverflow.com/questions/32557710/how-to-use-zeppelin-to-access-aws-spark-ec2-cluster-and-s3-buckets

标签

amazon-s3

amazon-ec2

apache-spark

apache-zeppelin