How do you search an amazon s3 bucket?

后端 未结 21 2290
渐次进展
渐次进展 2020-11-30 18:00

I have a bucket with thousands of files in it. How can I search the bucket? Is there a tool you can recommend?

相关标签:
21条回答
  • 2020-11-30 18:06

    There are (at least) two different use cases which could be described as "search the bucket":

    1. Search for something inside every object stored at the bucket; this assumes a common format for all the objects in that bucket (say, text files), etc etc. For something like this, you're forced to do what Cody Caughlan just answered. The AWS S3 docs has example code showing how to do this with the AWS SDK for Java: Listing Keys Using the AWS SDK for Java (there you'll also find PHP and C# examples).

    2. List item Search for something in the object keys contained in that bucket; S3 does have partial support for this, in the form of allowing prefix exact matches + collapsing matches after a delimiter. This is explained in more detail at the AWS S3 Developer Guide. This allows, for example, to implement "folders" through using as object keys something like

      folder/subfolder/file.txt
      If you follow this convention, most of the S3 GUIs (such as the AWS Console) will show you a folder view of your bucket.

    0 讨论(0)
  • 2020-11-30 18:07

    Take a look at this documentation: http://docs.aws.amazon.com/AWSSDKforPHP/latest/index.html#m=amazons3/get_object_list

    You can use a Perl-Compatible Regular Expression (PCRE) to filter the names.

    0 讨论(0)
  • 2020-11-30 18:08

    This is little bit old thread - but maybe help someone who still search - I'm the one who search for that a year.

    Solution may be "AWS Athena" where you can search over data like this

    'SELECT user_name FROM S3Object WHERE cast(age as int) > 20'
    

    https://aws.amazon.com/blogs/developer/introducing-support-for-amazon-s3-select-in-the-aws-sdk-for-javascript/

    Currently pricing is $5 for 1TB data - so for example, if your query search over one 1TB file 3times your cost is $15 - but for example if there is only 1column in "converted columnar format" what you want read, you'll pay 1/3 of price means $1.67/TB.

    0 讨论(0)
  • 2020-11-30 18:13

    Another option is to mirror the S3 bucket on your web server and traverse locally. The trick is that the local files are empty and only used as a skeleton. Alternatively, the local files could hold useful meta data that you normally would need to get from S3 (e.g. filesize, mimetype, author, timestamp, uuid). When you provide a URL to download the file, search locally and but provide a link to the S3 address.

    Local file traversing is easy and this approach for S3 management is language agnostic. Local file traversing also avoids maintaining and querying a database of files or delays making a series of remote API calls to authenticate and get the bucket contents.

    You could allow users to upload files directly to your server via FTP or HTTP and then transfer a batch of new and updated files to Amazon at off peak times by just recursing over the directories for files with any size. On the completion of a file transfer to Amazon, replace the web server file with an empty one of the same name. If a local file has any filesize then serve it directly because its awaiting batch transfer.

    0 讨论(0)
  • 2020-11-30 18:13

    Status 2018-07: Amazon do have native sql like search for csv and json files!

    https://aws.amazon.com/blogs/developer/introducing-support-for-amazon-s3-select-in-the-aws-sdk-for-javascript/

    0 讨论(0)
  • 2020-11-30 18:14

    Just a note to add on here: it's now 3 years later, yet this post is top in Google when you type in "How to search an S3 Bucket."

    Perhaps you're looking for something more complex, but if you landed here trying to figure out how to simply find an object (file) by it's title, it's crazy simple:

    open the bucket, select "none" on the right hand side, and start typing in the file name.

    http://docs.aws.amazon.com/AmazonS3/latest/UG/ListingObjectsinaBucket.html

    0 讨论(0)
提交回复
热议问题