Python boto, list contents of specific dir in bucket

前端 未结 7 1792
难免孤独
难免孤独 2020-12-13 18:19

I have S3 access only to a specific directory in an S3 bucket.

For example, with the s3cmd command if I try to list the whole bucket:

             


        
相关标签:
7条回答
  • 2020-12-13 18:42

    I just had this same problem, and this code does the trick.

    import boto3
    
    s3 = boto3.resource("s3")
    s3_bucket = s3.Bucket("bucket-name")
    dir = "dir-in-bucket"
    files_in_s3 = [f.key.split(dir + "/")[1] for f in 
    s3_bucket.objects.filter(Prefix=dir).all()]
    
    0 讨论(0)
  • 2020-12-13 18:46

    If you want to list all the objects of a folder in your bucket, you can specify it while listing.

    import boto
    conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(AWS_BUCKET_NAME)
    for file in bucket.list("FOLDER_NAME/", "/"):
        <do something with required file>
    
    0 讨论(0)
  • 2020-12-13 18:51

    For boto3

    import boto3
    
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket('my_bucket_name')
    
    for object_summary in my_bucket.objects.filter(Prefix="dir_name/"):
        print(object_summary.key)
    
    0 讨论(0)
  • 2020-12-13 18:51

    This can be done using:

    s3_client = boto3.client('s3')
    objects = s3_client.list_objects_v2(Bucket='bucket_name')
    for obj in objects['Contents']:
      print(obj['Key'])
    
    0 讨论(0)
  • 2020-12-13 18:58

    By default, when you do a get_bucket call in boto it tries to validate that you actually have access to that bucket by performing a HEAD request on the bucket URL. In this case, you don't want boto to do that since you don't have access to the bucket itself. So, do this:

    bucket = conn.get_bucket('my-bucket-url', validate=False)
    

    and then you should be able to do something like this to list objects:

    for key in bucket.list(prefix='dir-in-bucket'): 
        <do something>
    

    If you still get a 403 Errror, try adding a slash at the end of the prefix.

    for key in bucket.list(prefix='dir-in-bucket/'): 
        <do something>
    

    Note: this answer was written about the boto version 2 module, which is obsolete by now. At the moment (2020), boto3 is the standard module for working with AWS. See this question for more info: What is the difference between the AWS boto and boto3

    0 讨论(0)
  • 2020-12-13 19:02

    Boto3 client:

    import boto3
    
    _BUCKET_NAME = 'mybucket'
    _PREFIX = 'subfolder/'
    
    client = boto3.client('s3', aws_access_key_id=ACCESS_KEY,
                                aws_secret_access_key=SECRET_KEY)
    
    def ListFiles(client):
        """List files in specific S3 URL"""
        response = client.list_objects(Bucket=_BUCKET_NAME, Prefix=_PREFIX)
        for content in response.get('Contents', []):
            yield content.get('Key')
    
    file_list = ListFiles(client)
    for file in file_list:
        print 'File found: %s' % file
    

    Using session

    from boto3.session import Session
    
    _BUCKET_NAME = 'mybucket'
    _PREFIX = 'subfolder/'
    
    session = Session(aws_access_key_id=ACCESS_KEY,
                      aws_secret_access_key=SECRET_KEY)
    
    client = session.client('s3')
    
    def ListFilesV1(client, bucket, prefix=''):
        """List files in specific S3 URL"""
        paginator = client.get_paginator('list_objects')
        for result in paginator.paginate(Bucket=bucket, Prefix=prefix,
                                         Delimiter='/'):
            for content in result.get('Contents', []):
                yield content.get('Key')
    
    file_list = ListFilesV1(client, _BUCKET_NAME, prefix=_PREFIX)
    for file in file_list:
        print 'File found: %s' % file
    
    0 讨论(0)
提交回复
热议问题