SparkR on Rstudio - cannot access s3

天大地大妈咪最大 提交于 2020-01-03 02:23:09

问题


I have installed SparkR with R (and Rstudio) on EC2. I'm trying to read files located on s3:

temp  <- textFile(sc, "s3://dev.xxxx.com/txttest")

and get:

java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be 
specified as the username or password (respectively) of a s3 URL, or by setting the 
fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).`

I've tried to add my access key + secret like so:

temp  <- textFile(sc, "s3:{access_key:secret_key}@dev.xxxx.com/txttest")

and got:

Invalid hostname in URI s3://11111111111111111111:2222222222222222222222222222222222222222@dev.xxx.com
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41)`

I also tried to use

export AWS_SECRET_ACCESS_KEY=2222222222222222222222222222222222222222 
export AWS_ACCESS_KEY_ID=11111111111111111111`  

before launching the cluster but to no avail.

Questions:
1. How can I change the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties?
2. Is there a correct syntax I'm missing in the URI?

Any help would be greatly appreciated.

来源:https://stackoverflow.com/questions/29898880/sparkr-on-rstudio-cannot-access-s3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!