How to assign the access control list (ACL) when writing a CSV file to AWS in pyspark (2.2.0)?

问题

I know I can output my spark dataframe to AWS S3 as a CSV file by

df.repartition(1).write.csv('s3://my-bucket-name/df_name')

My question is that is there an easy way to set the Access Control List (ACL) of this file to 'bucket-owner-full-control' when writing it to S3 using pyspark?

回答1:

Don't know about the EMR s3 connector; in the ASF S3A connector you set the option fs.s3a.acl.default when you open the connection: you can't set it on a file-by-file basis

回答2:

Access Control List (ACL) can be set via Hadoop Configuration after building spark session.

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('YourAppName').getOrCreate()

Set ACL as below:

spark.sparkContext.hadoopConfiguration().set('fs.s3.canned.acl', 'BucketOwnerFullControl')

Reference: s3 documentation

回答3:

Ran into the exact same issue. Spark job wrote files to a bucket that had server side encryption set to Access Denied. After reading some blogs, I learned that this can be solved by setting the fs.s3a.acl.default parameter to BucketOwnerFullControl. Here is the code:

val spark =SparkSession.builder.appName().getOrCreate()

spark.sparkContext.hadoopConfiguration.set("fs.s3a.acl.default", "BucketOwnerFullControl")

来源：https://stackoverflow.com/questions/52673924/how-to-assign-the-access-control-list-acl-when-writing-a-csv-file-to-aws-in-py

标签

amazon-web-services

csv

amazon-s3

pyspark

acl

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!