I would like to know if a key exists in boto3. I can loop the bucket contents and check the key if it matches.
But that seems longer and an overkill. Boto3 official
From https://www.peterbe.com/plog/fastest-way-to-find-out-if-a-file-exists-in-s3 this is pointed out to be the fastest method:
import boto3
boto3_session = boto3.session.Session()
s3_session_client = boto3_session.client("s3")
response = s3_session_client.list_objects_v2(
Bucket=bc_df_caches_bucket, Prefix=s3_key
)
for obj in response.get("Contents", []):
if obj["Key"] == s3_key:
return True
return False
FWIW, here are the very simple functions that I am using
import boto3
def get_resource(config: dict={}):
"""Loads the s3 resource.
Expects AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be in the environment
or in a config dictionary.
Looks in the environment first."""
s3 = boto3.resource('s3',
aws_access_key_id=os.environ.get(
"AWS_ACCESS_KEY_ID", config.get("AWS_ACCESS_KEY_ID")),
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", config.get("AWS_SECRET_ACCESS_KEY")))
return s3
def get_bucket(s3, s3_uri: str):
"""Get the bucket from the resource.
A thin wrapper, use with caution.
Example usage:
>> bucket = get_bucket(get_resource(), s3_uri_prod)"""
return s3.Bucket(s3_uri)
def isfile_s3(bucket, key: str) -> bool:
"""Returns T/F whether the file exists."""
objs = list(bucket.objects.filter(Prefix=key))
return len(objs) == 1 and objs[0].key == key
def isdir_s3(bucket, key: str) -> bool:
"""Returns T/F whether the directory exists."""
objs = list(bucket.objects.filter(Prefix=key))
return len(objs) > 1
There is one simple way by which we can check if file exists or not in S3 bucket. We donot need to use exception for this
sesssion = boto3.Session(aws_access_key_id, aws_secret_access_key)
s3 = session.client('s3')
object_name = 'filename'
bucket = 'bucketname'
obj_status = s3.list_objects(Bucket = bucket, Prefix = object_name)
if obj_status.get('Contents'):
print("File exists")
else:
print("File does not exists")
Use this concise oneliner, makes it less intrusive when you have to throw it inside an existing project without modifying much of the code.
s3_file_exists = lambda filename: bool(list(bucket.objects.filter(Prefix=filename)))
The above function assumes the bucket
variable was already declared.
You can extend the lambda to support additional parameter like
s3_file_exists = lambda filename, bucket: bool(list(bucket.objects.filter(Prefix=filename)))
If you have less than 1000 in a directory or bucket you can get set of them and after check if such key in this set:
files_in_dir = {d['Key'].split('/')[-1] for d in s3_client.list_objects_v2(
Bucket='mybucket',
Prefix='my/dir').get('Contents') or []}
Such code works even if my/dir
is not exists.
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2
I noticed that just for catching the exception using botocore.exceptions.ClientError
we need to install botocore. botocore takes up 36M of disk space. This is particularly impacting if we use aws lambda functions. In place of that if we just use exception then we can skip using the extra library!
The code looks like this. Please share your thoughts:
import boto3
import traceback
def download4mS3(s3bucket, s3Path, localPath):
s3 = boto3.resource('s3')
print('Looking for the csv data file ending with .csv in bucket: ' + s3bucket + ' path: ' + s3Path)
if s3Path.endswith('.csv') and s3Path != '':
try:
s3.Bucket(s3bucket).download_file(s3Path, localPath)
except Exception as e:
print(e)
print(traceback.format_exc())
if e.response['Error']['Code'] == "404":
print("Downloading the file from: [", s3Path, "] failed")
exit(12)
else:
raise
print("Downloading the file from: [", s3Path, "] succeeded")
else:
print("csv file not found in in : [", s3Path, "]")
exit(12)