Is there a way to iterate through s3 object content using a SQL expression?

半城伤御伤魂 提交于 2020-06-28 14:06:49

问题


I would like to iterate through each s3 bucket object and use a sql expression to find all the content that match the sql.

I was able to create a python script that lists all the objects inside my bucket.

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucketname')
startAfter = 'bucketname/directory'
for obj in bucket.objects.all():
    print(obj.key)

I was also able to create a python script that uses a sql expression to look through the object content.

import boto3

S3_BUCKET = 'bucketname'

s3 = boto3.client('s3')

var1 = 'aj9c03869'
var2 = 'b3bu11043'


r = s3.select_object_content(
        Bucket=S3_BUCKET,
        Key='name_of_object',
        ExpressionType='SQL',
        Expression='select * from s3object s where s.\"serialnumber\" in (%r,%r) ' % (var1,var2),
        OutputSerialization={'JSON': {}},
        InputSerialization={
        'CompressionType': 'GZIP',
        'JSON': {
            'Type': 'DOCUMENT'
    } }, )


for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)

I would like to create a loop that goes through each bucket object, uses the sql expression to find the data within the object, and returns all the matches.

--Edit:

The reason why I am trying to query all the objects is to find content within the objects and delete specific data. I appreciate the answers about Athena but I don't think that would work in my case.


回答1:


Take a look at Amazon Athena – Interactive SQL Queries for Data in Amazon S3




回答2:


You may want to look at S3 Batch Operations which will allow you to execute your python code (deployed to lambda) on all the objects in your bucket(s)

With this solution, you won't need to list the objects in the bucket but can have AWS run your script on all of your objects.




回答3:


S3 Select is also an option but Athena would be easier.



来源:https://stackoverflow.com/questions/56069436/is-there-a-way-to-iterate-through-s3-object-content-using-a-sql-expression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!