AWS S3 list keys containing a string

岁酱吖の 提交于 2020-06-16 07:32:19

问题


I am using python in AWS Lambda function to list keys in a s3 bucket that contains a specific id

for object in mybucket.objects.all():
            file_name = os.path.basename(object.key)
            match_id = file_name.split('_', 1)[0]

The problem is if a s3 bucket has several thousand files the iteration is very inefficient and sometimes lambda function times out

Here is an example file name

https://s3.console.aws.amazon.com/s3/object/bucket-name/012345_abc_happy.jpg

i want to only iterate objects that contains "012345" in the key name Any good suggestion on how i can accomplish that


回答1:


Here is how you need to solve it.

S3 stores everything as objects and there is no folder or filename. It is all for user convenience.

aws s3 ls s3://bucket/folder1/folder2/filenamepart --recursive

will get all s3 objects name that matches to that name.

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('bucketname')
for obj in my_bucket.objects.filter(Prefix='012345'):
    print(obj)

To speed up the list you can run multiple scripts parallelly.

Hope it helps.




回答2:


You can improve speed by 30-40% by dropping os and using string methods.
Depending on the assumptions you can make about the file path string, you can get additional speedups:

Using os.path.basename():

%%timeit
match = "012345"
fname = "https://s3.console.aws.amazon.com/s3/object/bucket-name/012345_abc_happy.jpg"
os.path.basename(fname).split("_")[0] == match

# 1.03 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Without os, splitting first on / and then on _:

%%timeit
match = "012345"
fname = "https://s3.console.aws.amazon.com/s3/object/bucket-name/012345_abc_happy.jpg"
fname.split("/")[-1].split("_")[0] == match

# 657 ns ± 11.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

If you know that the only underscores occur in the actual file name, you can use just one split():

%%timeit
match = "012345"
fname = "https://s3.console.aws.amazon.com/s3/object/bucket-name/012345_abc_happy.jpg"
fname.split("_")[0][-6:] == match

# 388 ns ± 5.65 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


来源:https://stackoverflow.com/questions/47878893/aws-s3-list-keys-containing-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!