How to download the latest file of an S3 bucket using Boto3?

你离开我真会死。 提交于 2019-12-19 08:06:37

问题


The other questions I could find were refering to an older version of Boto. I would like to download the latest file of an S3 bucket. In the documentation I found that there is a method list_object_versions() that gets you a boolean IsLatest. Unfortunately I only managed to set up a connection and to download a file. Could you please show me how I can extend my code to get the latest file of the bucket? Thank you

import boto3
conn = boto3.client('s3',
                    region_name="eu-west-1",
                    endpoint_url="customendpoint",
                    config=Config(signature_version="s3", s3={'addressing_style': 'path'}))

From here I dont know how to get the latest added file from a bucket called mytestbucket. There are various csv files in the bucket but all of course with a different name.

Update:

import boto3
from botocore.client import Config

s3 = boto3.resource('s3', region_name="eu-west-1", endpoint_url="custom endpoint", aws_access_key_id = '1234', aws_secret_access_key = '1234', config=Config(signature_version="s3", s3={'addressing_style': 'path'}))
my_bucket = s3.Bucket('mytestbucket22')
unsorted = []
for file in my_bucket.objects.filter():
   unsorted.append(file)

files = [obj.key for obj in sorted(unsorted, key=get_last_modified, reverse=True)][0:9]

This gives me the following error:

NameError: name 'get_last_modified' is not defined

回答1:


Variation of the answer I provided for: Boto3 S3, sort bucket by last modified. You can modify the code to suit to your needs.

get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))

s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my_bucket')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified)][0]

If you want to reverse the sort:

[obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]



回答2:


You can do

import boto3

s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='bucket_name', Prefix='prefix')
all = response['Contents']        
latest = max(all, key=lambda x: x['LastModified'])



回答3:


This is basically the same answer as helloV in the case you use Session as I'm doing.

from boto3.session import Session
import settings

session = Session(aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
                          aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
s3 = session.resource("s3")

get_last_modified = lambda obj: int(obj.last_modified.strftime('%s'))


bckt = s3.Bucket("my_bucket")
objs = [obj for obj in bckt.objects.all()]

objs = [obj for obj in sorted(objs, key=get_last_modified)]
last_added = objs[-1].key

Having objs sorted allows you to quickly delete all files but the latest with

for obj in objs[:-1]:
    s3.Object("my_bucket", obj.key).delete()



回答4:


If you have a lot of files then you'll need to use pagination as mentioned by helloV. This is how I did it.

get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
paginator = s3.get_paginator( "list_objects" )
page_iterator = paginator.paginate( Bucket = "BucketName", Prefix = "Prefix")
for page in page_iterator:
    if "Contents" in page:
        last_added = [obj['Key'] for obj in sorted( page["Contents"], key=get_last_modified)][-1]



回答5:


You should be able to download the latest version of the file using default download file command

import boto3
import botocore

BUCKET_NAME = 'mytestbucket'
KEY = 'fileinbucket.txt'

s3 = boto3.resource('s3')

try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, 'downloadname.txt')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

Reference link

To get the last modified or uploaded file you can use the following

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('myBucket')
unsorted = []
for file in my_bucket.objects.filter():
   unsorted.append(file)

files = [obj.key for obj in sorted(unsorted, key=get_last_modified, 
    reverse=True)][0:9]

As answer in this reference link states, its not the optimal but it works.



来源:https://stackoverflow.com/questions/45375999/how-to-download-the-latest-file-of-an-s3-bucket-using-boto3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!