发表新帖

发表新帖

Running EMR Spark With Multiple S3 Accounts

前端未结

关注

 4  1234

隐瞒了意图╮ 2020-12-29 11:12

I have an EMR Spark Job that needs to read data from S3 on one account and write to another.
I split my job into two steps.

read data from the S3 (no

4条回答

甜味超标 (楼主)

2020-12-29 11:35
Using spark you can also use assume role to access an s3 bucket in another account but using an IAM Role in the other account. This makes it easier for the other account owner to manage the permissions provided to the spark job. Managing access via s3 bucket policies can be a pain as access rights are distributed to multiple locations rather than all contained within a single IAM role.

Here is the hadoopConfiguration:
```
"fs.s3a.credentialsType" -> "AssumeRole",
"fs.s3a.stsAssumeRole.arn" -> "arn:aws:iam::<>:role/<>",
"fs.s3a.impl" -> "com.databricks.s3a.S3AFileSystem",
"spark.hadoop.fs.s3a.server-side-encryption-algorithm" -> "aws:kms",
"spark.hadoop.fs.s3a.server-side-encryption-kms-master-key-id" -> "arn:aws:kms:ap-southeast-2:<>:key/<>"
```
External IDs can also be used as a passphrase:
```
"spark.hadoop.fs.s3a.stsAssumeRole.externalId" -> "GUID created by other account owner"
```
We were using databricks for the above have not tried using EMR yet.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题