How to use Zeppelin to access aws spark-ec2 cluster and s3 buckets

試著忘記壹切 提交于 2019-12-10 13:48:35

问题


I have an aws ec2 cluster setup by the spark-ec2 script.

I would like to configure Zeppelin so that I can write scala code locally on Zeppelin and run it on the cluster (via master). Furthermore I would like to be able to access my s3 buckets.

I followed this guide and this other one however I can not seem to run scala code from zeppelin to my cluster.

I installed Zeppelin locally with

mvn install -DskipTests -Dspark.version=1.4.1 -Dhadoop.version=2.7.1

My security groups were set to both AmazonEC2FullAccess and AmazonS3FullAccess.

I edited the spark interpreter properties on the Zeppelin Webapp to spark://.us-west-2.compute.amazonaws.com:7077 from local[*]

  1. When I test out

    sc
    

    in the interpreter, I recieve this error

    java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at 
    
  2. When I try to edit "conf/zeppelin-site.xml" to change my port to 8082, no difference.

NOTE: I eventually would also want to access my s3 buckets with something like:

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
val file = "s3n://<<bucket>>/<<file>>"
val data = sc.textFile(file)
data.first

if any benevolent users have any advice (that wasn't already posted on StackOverflow) please let me know!


回答1:


Most likely your IP address is blocked from connecting to your spark cluster. You can try by launching the spark-shell pointing at that end point (or even just telnetting). To fix it you can log into your AWS account and change the firewall settings. Its also possible that it isn't pointed at the correct host (I'm assuming you removed the specific box from spark://.us-west-2.compute.amazonaws.com:7077 but if not there should be a bit for the .us-west-2). You can try ssh'ing to that machine and running netstat --tcp -l -n to see if its listening (or even just ps aux |grep java to see if Spark is running).



来源:https://stackoverflow.com/questions/32557710/how-to-use-zeppelin-to-access-aws-spark-ec2-cluster-and-s3-buckets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!