How to use AWS CLI to only copy files in S3 bucket that match a given string pattern

◇◆丶佛笑我妖孽 提交于 2019-12-03 06:29:22

The alternatives that you have listed are the best options because S3 CLI doesn't support regex.

Use of Exclude and Include Filters:

Currently, there is no support for the use of UNIX style wildcards in a command's path arguments. However, most commands have --exclude "" and --include "" parameters that can achieve the desired result. These parameters perform pattern matching to either exclude or include a particular file or object. The following pattern symbols are supported.

*: Matches everything
?: Matches any single character
[sequence]: Matches any character in sequence
[!sequence]: Matches any character not in sequence

Putting this here for others to find, since I just had to figure this out. Here's what I came up with:

s3cmd del $(s3cmd ls s3://[BUCKET]/ | grep '.*s3://[BUCKET]/[FILENAME]' | cut -c 41-)

You can put the regex in the grep search string. For instance, I was searching for specific files to delete (hence the s3cmd del). My regex looked like: '2016-11-04.*s3.*[DN][RS].*'. You may have to adjust the cut for your use. Should also work with s3cmd get.

here is the same solution for deletion, you can replace rm with cp you can do it using aws cli : https://aws.amazon.com/cli/ and some unix command.

this aws cli commands should work:

aws s3 rm s3://<your_bucket_name> --exclude "*" --include "<your_regex>"

if you want to include sub-folders you should add the flag --recursive

or with unix commands:

aws s3 ls s3://<your_bucket_name>/ | awk '{print $4}' | xargs -I%  <your_os_shell>   -c 'aws s3 rm s3:// <your_bucket_name>/% $1'

explanation:

  1. list all files on the bucket --pipe-->
  2. get the 4th parameter(its the file name) --pipe--> // you can replace it with linux command to match your pattern
  3. run delete script with aws cli
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!