Read a file line by line from S3 using boto?

前端未结

关注

 10  1026

I have a csv file in S3 and I\'m trying to read the header line to get the size (these files are created by our users so they could be almost any size). Is there a way to do

相关标签:

10条回答

渐次进展

2020-11-29 08:05

I know it's a very old question.

But as for now, we can just use s3_conn.get_object(Bucket=bucket, Key=key)['Body'].iter_lines()

0 讨论(0)
发布评论:

提交评论
- 加载中...

花落未央

2020-11-29 08:06

Using boto3:

s3 = boto3.resource('s3')
obj = s3.Object(BUCKET, key)
for line in obj.get()['Body']._raw_stream:
    # do something with line

0 讨论(0)

醉梦人生

2020-11-29 08:10

If you want to read multiple files (line by line) with a specific bucket prefix (i.e., in a "subfolder") you can do this:

s3 = boto3.resource('s3', aws_access_key_id='<key_id>', aws_secret_access_key='<access_key>')

    bucket = s3.Bucket('<bucket_name>')
    for obj in bucket.objects.filter(Prefix='<your prefix>'):
        for line in obj.get()['Body'].read().splitlines():
            print(line.decode('utf-8'))

Here lines are bytes so I am decoding them; but if they are already a string, you can skip that.

0 讨论(0)

迷失自我

2020-11-29 08:12

Here's a solution which actually streams the data line by line:

from io import TextIOWrapper
from gzip import GzipFile
...

# get StreamingBody from botocore.response
response = s3.get_object(Bucket=bucket, Key=key)
# if gzipped
gzipped = GzipFile(None, 'rb', fileobj=response['Body'])
data = TextIOWrapper(gzipped)

for line in data:
    # process line

0 讨论(0)

上一页 1 2