check if a key exists in a bucket in s3 using boto3

前端 未结 24 2805
花落未央
花落未央 2020-11-28 19:03

I would like to know if a key exists in boto3. I can loop the bucket contents and check the key if it matches.

But that seems longer and an overkill. Boto3 official

相关标签:
24条回答
  • 2020-11-28 19:44

    From https://www.peterbe.com/plog/fastest-way-to-find-out-if-a-file-exists-in-s3 this is pointed out to be the fastest method:

    import boto3
    
    boto3_session = boto3.session.Session()
    s3_session_client = boto3_session.client("s3")
    response = s3_session_client.list_objects_v2(
        Bucket=bc_df_caches_bucket, Prefix=s3_key
    )
    for obj in response.get("Contents", []):
        if obj["Key"] == s3_key:
            return True
    return False
    
    0 讨论(0)
  • 2020-11-28 19:45

    FWIW, here are the very simple functions that I am using

    import boto3
    
    def get_resource(config: dict={}):
        """Loads the s3 resource.
    
        Expects AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be in the environment
        or in a config dictionary.
        Looks in the environment first."""
    
        s3 = boto3.resource('s3',
                            aws_access_key_id=os.environ.get(
                                "AWS_ACCESS_KEY_ID", config.get("AWS_ACCESS_KEY_ID")),
                            aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", config.get("AWS_SECRET_ACCESS_KEY")))
        return s3
    
    
    def get_bucket(s3, s3_uri: str):
        """Get the bucket from the resource.
        A thin wrapper, use with caution.
    
        Example usage:
    
        >> bucket = get_bucket(get_resource(), s3_uri_prod)"""
        return s3.Bucket(s3_uri)
    
    
    def isfile_s3(bucket, key: str) -> bool:
        """Returns T/F whether the file exists."""
        objs = list(bucket.objects.filter(Prefix=key))
        return len(objs) == 1 and objs[0].key == key
    
    
    def isdir_s3(bucket, key: str) -> bool:
        """Returns T/F whether the directory exists."""
        objs = list(bucket.objects.filter(Prefix=key))
        return len(objs) > 1
    
    0 讨论(0)
  • 2020-11-28 19:47

    There is one simple way by which we can check if file exists or not in S3 bucket. We donot need to use exception for this

    sesssion = boto3.Session(aws_access_key_id, aws_secret_access_key)
    s3 = session.client('s3')
    
    object_name = 'filename'
    bucket = 'bucketname'
    obj_status = s3.list_objects(Bucket = bucket, Prefix = object_name)
    if obj_status.get('Contents'):
        print("File exists")
    else:
        print("File does not exists")
    
    0 讨论(0)
  • 2020-11-28 19:47

    Use this concise oneliner, makes it less intrusive when you have to throw it inside an existing project without modifying much of the code.

    s3_file_exists = lambda filename: bool(list(bucket.objects.filter(Prefix=filename)))
    

    The above function assumes the bucket variable was already declared.

    You can extend the lambda to support additional parameter like

    s3_file_exists = lambda filename, bucket: bool(list(bucket.objects.filter(Prefix=filename)))
    
    0 讨论(0)
  • 2020-11-28 19:48

    If you have less than 1000 in a directory or bucket you can get set of them and after check if such key in this set:

    files_in_dir = {d['Key'].split('/')[-1] for d in s3_client.list_objects_v2(
    Bucket='mybucket',
    Prefix='my/dir').get('Contents') or []}
    

    Such code works even if my/dir is not exists.

    http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2

    0 讨论(0)
  • 2020-11-28 19:50

    I noticed that just for catching the exception using botocore.exceptions.ClientError we need to install botocore. botocore takes up 36M of disk space. This is particularly impacting if we use aws lambda functions. In place of that if we just use exception then we can skip using the extra library!

    • I am validating for the file extension to be '.csv'
    • This will not throw an exception if the bucket does not exist!
    • This will not throw an exception if the bucket exists but object does not exist!
    • This throws out an exception if the bucket is empty!
    • This throws out an exception if the bucket has no permissions!

    The code looks like this. Please share your thoughts:

    import boto3
    import traceback
    
    def download4mS3(s3bucket, s3Path, localPath):
        s3 = boto3.resource('s3')
    
        print('Looking for the csv data file ending with .csv in bucket: ' + s3bucket + ' path: ' + s3Path)
        if s3Path.endswith('.csv') and s3Path != '':
            try:
                s3.Bucket(s3bucket).download_file(s3Path, localPath)
            except Exception as e:
                print(e)
                print(traceback.format_exc())
                if e.response['Error']['Code'] == "404":
                    print("Downloading the file from: [", s3Path, "] failed")
                    exit(12)
                else:
                    raise
            print("Downloading the file from: [", s3Path, "] succeeded")
        else:
            print("csv file not found in in : [", s3Path, "]")
            exit(12)
    
    0 讨论(0)
提交回复
热议问题