How to grep a term from S3 and output object name

谁说我不能喝 提交于 2021-02-20 04:49:07

问题


I need to grep a term over thousands of files in S3, and list those file names in some output file. I'm quite new using cli, so I've been testing both on my local, and in a small subset in s3.

So far I've got this:

aws s3 cp s3://mybucket/path/to/file.csv - | grep -iln searchterm > output.txt

The problem with this is with the hyphen. Since I'm copying over to standard output, the -l switch in grep returns (standard input) instead of file.csv

My desired output is

file.csv

Eventually, I'll need to iterate this over the whole bucket, and then all buckets, to get

file1.csv
file2.csv
file3.csv

But I need to get over this hurdle first. Thanks!


回答1:


Because you print the file in STDOUT and pipe that to grep STDIN, grep has no idea that the original file was file.csv. If you have a long list of files, I would do:

while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | grep -q searchterm && { echo ${file} >> output.txt; }; done < files_list.txt

I cannot try it, because I do not have access to an AWS S3 instance, but the trick is to use grep quietly (-q), it will return true if it finds at least a match, false otherwise; Then you can print the name of the file.

EDIT: Explanation

  1. The while loop will iterate over each line of files_list.txt
  2. The aws command will print this file in stdout
  3. We redirect stdout to grep in quiet mode (-q) which acts as a pattern matcher, returning true if a match was found, false ohter wise.
  4. If grep returns true, we append the name of the file (${file}) to our output file.

EDIT2: Other solution

while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | sed -n /searchpattern/{F;q} >> output.txt; done < files_list.txt

Explanation

Steps 1 and 2 are the same, then:

  1. stdout is redirected to sed, which will look in the file line by line until it finds the first stream pattern, and then quit (q), printing the file name (F) in the output file.


来源:https://stackoverflow.com/questions/42707646/how-to-grep-a-term-from-s3-and-output-object-name

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!