I have S3 access only to a specific directory in an S3 bucket.
For example, with the s3cmd
command if I try to list the whole bucket:
I just had this same problem, and this code does the trick.
import boto3
s3 = boto3.resource("s3")
s3_bucket = s3.Bucket("bucket-name")
dir = "dir-in-bucket"
files_in_s3 = [f.key.split(dir + "/")[1] for f in
s3_bucket.objects.filter(Prefix=dir).all()]
If you want to list all the objects of a folder in your bucket, you can specify it while listing.
import boto
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(AWS_BUCKET_NAME)
for file in bucket.list("FOLDER_NAME/", "/"):
<do something with required file>
For boto3
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket_name')
for object_summary in my_bucket.objects.filter(Prefix="dir_name/"):
print(object_summary.key)
This can be done using:
s3_client = boto3.client('s3')
objects = s3_client.list_objects_v2(Bucket='bucket_name')
for obj in objects['Contents']:
print(obj['Key'])
By default, when you do a get_bucket
call in boto it tries to validate that you actually have access to that bucket by performing a HEAD
request on the bucket URL. In this case, you don't want boto to do that since you don't have access to the bucket itself. So, do this:
bucket = conn.get_bucket('my-bucket-url', validate=False)
and then you should be able to do something like this to list objects:
for key in bucket.list(prefix='dir-in-bucket'):
<do something>
If you still get a 403 Errror, try adding a slash at the end of the prefix.
for key in bucket.list(prefix='dir-in-bucket/'):
<do something>
Note: this answer was written about the boto version 2 module, which is obsolete by now. At the moment (2020), boto3 is the standard module for working with AWS. See this question for more info: What is the difference between the AWS boto and boto3
Boto3 client:
import boto3
_BUCKET_NAME = 'mybucket'
_PREFIX = 'subfolder/'
client = boto3.client('s3', aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
def ListFiles(client):
"""List files in specific S3 URL"""
response = client.list_objects(Bucket=_BUCKET_NAME, Prefix=_PREFIX)
for content in response.get('Contents', []):
yield content.get('Key')
file_list = ListFiles(client)
for file in file_list:
print 'File found: %s' % file
Using session
from boto3.session import Session
_BUCKET_NAME = 'mybucket'
_PREFIX = 'subfolder/'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
client = session.client('s3')
def ListFilesV1(client, bucket, prefix=''):
"""List files in specific S3 URL"""
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Prefix=prefix,
Delimiter='/'):
for content in result.get('Contents', []):
yield content.get('Key')
file_list = ListFilesV1(client, _BUCKET_NAME, prefix=_PREFIX)
for file in file_list:
print 'File found: %s' % file