Python Boto3 - how to check if s3 file is completely written before process start copying to another bucket

ぐ巨炮叔叔 提交于 2020-07-09 15:07:49

问题


How to make sure that Process A has completely written large file (5+ GB) in AWS S3 Bucket A before Process B starts copying file to AWS S3 Bucket B using boto3?


回答1:


If a new object is being created in Amazon S3, it will only appear after the upload is complete. Other processes will not be able to view it until is has finished uploading.

Objects cannot be updated in S3. Rather, they are replaced with a new object. So, if an object is in the process of being updated, it will still appear as the old object to other processes.

The best way would be to trigger Process B by Configuring Amazon S3 Event Notifications. Once the new object is uploaded, S3 can trigger a Lambda function (or send a notification) that can then perform the second step.




回答2:


You should definitely use s3 event notification as a trigger to a lambda function that copies your file from Bucket A to Bucket B. The trigger ensures that your copying will start once the file gets uploaded completely.

Moreover, if you have further operations to perform you can use AWS step functions in which you can define the workflow of your processes , e.g. process B will start after 2 seconds from process A, process C and D will execute in parallel after process B ends it's execution , etc.




回答3:


I also do uploads of up to 40GB.

Since I do multi-part uploads, I check if the file I am writing to is closed. An S3 file(object) is only closed when all operations are finished.

Another way is to use asynchronous task queue like Celery. You will get notifications when a task is done.

I now use Golang but both those methods have worked very well for me.



来源:https://stackoverflow.com/questions/50515323/python-boto3-how-to-check-if-s3-file-is-completely-written-before-process-star

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!