AWS CLI S3API find newest folder in path

久未见 提交于 2019-12-08 19:23:42

问题


I've got a very large bucket (hundreds of thousands of objects). I've got a path (lets say s3://myBucket/path1/path2). /path2 gets uploads that are also folders. So a sample might look like:

s3://myBucket/path1/path2/v6.1.0
s3://myBucket/path1/path2/v6.1.1
s3://myBucket/path1/path2/v6.1.102
s3://myBucket/path1/path2/v6.1.2
s3://myBucket/path1/path2/v6.1.25
s3://myBucket/path1/path2/v6.1.99

S3 doesn't take into account version number sorting (which makes sense) but alphabetically the last in the list is not the last uploaded. In that example .../v6.1.102 is the newest.

Here's what I've got so far:

aws s3api list-objects 
--bucket myBucket
--query "sort_by(Contents[?contains(Key, \`path1/path2\`)],&LastModified)"´ 
--max-items 20000

So one problem here is max-items seems to start alphabetically from the all files recursively in the bucket. 20000 does get to my files but it's a pretty slow process to go through that many files.

So my questions are twofold:

1 - This is still searching the whole bucket but I just want to narrow it down to path2/ . Can I do this?

2 - This lists just objects, is it possible to pull up just a path list instead?

Basically the end goal is I just want a command to return the newest folder name like 'v6.1.102' from the example above.


回答1:


To answer #1, you could add the --prefix path1/path2 to limit what you're querying in the bucket.

In terms of sorting by last modified, I can only think of using an SDK to combine the list_objects_v2 and head_object (boto3) to get last modified on the objects and programmatically sort

Update

Alternatively, you could reverse sort by LastModified in jmespath and return the first item to give you the most recent object and gather the directory from there.

aws s3api list-objects-v2 \
--bucket myBucket \
--prefix path1/path2 \
--query 'reverse(sort_by(Contents,&LastModified))[0]'



回答2:


If you want general purpose querying e.g. "lowest version", "highest version", "all v6.x versions" then consider maintaining a separate database with the version numbers.

If you only need to know the highest version number and you need that to be retrieved quickly (quicker than a list object call) then you could maintain that version number independently. For example, you could use a Lambda function that responds to objects being uploaded to path1/path2 where the Lambda function is responsible for storing the highest version number that it has seen into a file at s3://mybucket/version.max.



来源:https://stackoverflow.com/questions/47229999/aws-cli-s3api-find-newest-folder-in-path

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!