Why Zeppelin notebook is not able to connect to S3

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-01 07:34:40

The following installation worked for me (spent also many days with the problem):

  1. Spark 1.3.1 prebuild for Hadoop 2.3 setup on EC2-cluster

  2. git clone https://github.com/apache/incubator-zeppelin.git (date: 25.07.2015)

  3. installed zeppelin via the following command (belonging to instructions on https://github.com/apache/incubator-zeppelin):

    mvn clean package -Pspark-1.3 -Dhadoop.version=2.3.0 -Phadoop-2.3 -DskipTests

  4. Port change via "conf/zeppelin-site.xml" to 8082 (Spark uses Port 8080)

After this installation steps my notebook worked with S3 files:

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
val file = "s3n://<<bucket>>/<<file>>"
val data = sc.textFile(file)
data.first

I think that the S3 problem is not resolved completely in Zeppelin Version 0.5.0, so cloning the actual git-repo did it for me.

Important Information: The job only worked for me with zeppelin spark-interpreter setting master=local[*] (instead of using spark://master:7777)

For me it worked in one two steps-

1. creating sqlContext -
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
2. reading s3 files like this. - 
val performanceFactor = sqlContext.
      read.  parquet("s3n://<accessKey>:<secretKey>@mybucket/myfile/")

where access key and secret key you need to supply. in #2 I am using s3n protocol and access and secret keys in path itself.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!