amazon-s3

Spark FileAlreadyExistsException on Stage Failure

爷,独闯天下 提交于 2020-08-24 07:57:05
问题 I am trying to write a dataframe to s3 location after re-partitioning. But whenever the write stage fails and Spark retry the stage it throws FileAlreadyExistsException. When I re-submit the job it works fine if spark completes the stage in one try. Below is my code block df.repartition(<some-value>).write.format("orc").option("compression", "zlib").mode("Overwrite").save(path) I believe Spark should remove files from the failed stage before retry. I understand this will be solved if we set

AWS S3 - Move an object to a different folder

Deadly 提交于 2020-08-22 12:11:37
问题 Is there any way to move an object to a different folder in the same bucket using the AWS SDK (preferably for .Net)? Searching around all I can see is the suggestion of Copy to new location and Delete of the original (Which is easy enough via "CopyObjectRequest" and "DeleteObjectRequest") however I'm just wondering is this the only way? 回答1: Turns out you can use Amazon.S3.IO.S3FileInfo to get the object and then call the "MoveTo" method to move the object. S3FileInfo currentObject = new

AWS S3 - Move an object to a different folder

陌路散爱 提交于 2020-08-22 12:10:32
问题 Is there any way to move an object to a different folder in the same bucket using the AWS SDK (preferably for .Net)? Searching around all I can see is the suggestion of Copy to new location and Delete of the original (Which is easy enough via "CopyObjectRequest" and "DeleteObjectRequest") however I'm just wondering is this the only way? 回答1: Turns out you can use Amazon.S3.IO.S3FileInfo to get the object and then call the "MoveTo" method to move the object. S3FileInfo currentObject = new

AWS: how to fix S3 event replacing space with '+' sign in object key names in json

ぐ巨炮叔叔 提交于 2020-08-22 03:33:22
问题 I have a lamba function to copy objects from bucket 'A' to bucket 'B', and everything was working fine, until and object with name 'New Text Document.txt' was created in bucket 'A', the json that gets built in S3 event, key as "key": "New+Text+Document.txt". the spaces got replaced with '+'. I know it is a known issue by seraching on web. But I am not sure how to fix this and the incoming json itself has a '+' and '+' can be actually in the name of the file. like 'New+Text Document.txt'. So I

How to get more than 1000 objects from S3 by using list_objects_v2?

泪湿孤枕 提交于 2020-08-22 03:02:35
问题 I have more than 500,000 objects on s3. I am trying get the size of each object. I am using the following python code for that import boto3 bucket = 'bucket' prefix = 'prefix' contents = boto3.client('s3').list_objects_v2(Bucket=bucket, MaxKeys=1000, Prefix=prefix)["Contents"] for c in contents: print(c["Size"]) But it just gave me the size of top 1000 objects. Based on the documentation we can't get more 1000. Is there any way I can get more than that? 回答1: Use the ContinuationToken returned

How to get more than 1000 objects from S3 by using list_objects_v2?

℡╲_俬逩灬. 提交于 2020-08-22 03:02:27
问题 I have more than 500,000 objects on s3. I am trying get the size of each object. I am using the following python code for that import boto3 bucket = 'bucket' prefix = 'prefix' contents = boto3.client('s3').list_objects_v2(Bucket=bucket, MaxKeys=1000, Prefix=prefix)["Contents"] for c in contents: print(c["Size"]) But it just gave me the size of top 1000 objects. Based on the documentation we can't get more 1000. Is there any way I can get more than that? 回答1: Use the ContinuationToken returned

response to preflight request doesn't pass access control check: No 'Access-control-Allow-Origin' header is present in the requested resource

我们两清 提交于 2020-08-20 12:09:26
问题 Our development team is trying to upload the files into S3 with .net and facing The S3 bucket is configured with the CORS policy as follows: <?xml version="1.0" encoding="UTF-8"?> <CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <CORSRule> <AllowedOrigin>http://localhost:3000</AllowedOrigin> <AllowedMethod>GET</AllowedMethod> <AllowedMethod>HEAD</AllowedMethod> <AllowedMethod>PUT</AllowedMethod> <AllowedMethod>POST</AllowedMethod> <AllowedMethod>DELETE</AllowedMethod>

response to preflight request doesn't pass access control check: No 'Access-control-Allow-Origin' header is present in the requested resource

拜拜、爱过 提交于 2020-08-20 12:09:04
问题 Our development team is trying to upload the files into S3 with .net and facing The S3 bucket is configured with the CORS policy as follows: <?xml version="1.0" encoding="UTF-8"?> <CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <CORSRule> <AllowedOrigin>http://localhost:3000</AllowedOrigin> <AllowedMethod>GET</AllowedMethod> <AllowedMethod>HEAD</AllowedMethod> <AllowedMethod>PUT</AllowedMethod> <AllowedMethod>POST</AllowedMethod> <AllowedMethod>DELETE</AllowedMethod>

How to create a key as folder and value as files

流过昼夜 提交于 2020-08-20 06:27:06
问题 I have a bucket name testfolder Inside testfolder there are test1,test2,test3 Each folder have there are csv files Need to create a key value pair for folder and files Expected out output1 { 'test1':['csv1.csv'], 'test2':['csv2'], 'test3':['csv3']} output2 { 'test1':'csv1.csv', 'test2':'csv2', 'test3':'csv3'} #list all the objects import boto3 s3 = boto3.client("s3") final_data = {} all_objects = s3.list_objects(Bucket = 'testfolder') #List the object in subfolder #create a dictionary 回答1:

How to extract the elements from csv to json in S3

六眼飞鱼酱① 提交于 2020-08-19 17:39:07
问题 I need to find the csv files from the folder List all the files inside the folder Convert files to json and save in the same bucket Csv file, Like below so many csv files are there emp_id,Name,Company 10,Aka,TCS 11,VeI,TCS Code is below import boto3 import pandas as pd def lambda_handler(event, context): s3 = boto3.resource('s3') my_bucket = s3.Bucket('testfolder') for file in my_bucket.objects.all(): print(file.key) for csv_f in file.key: with open(f'{csv_f.replace(".csv", ".json")}', "w")