How do I read a csv stored in S3 with csv.DictReader?

淺唱寂寞╮ 提交于 2020-01-13 08:07:29

问题


I have code that fetches an AWS S3 object. How do I read this StreamingBody with Python's csv.DictReader?

import boto3, csv

session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>)
s3_resource = session.resource('s3')
s3_object = s3_resource.Object(<bucket>, <key>)
streaming_body = s3_object.get()['Body']

#csv.DictReader(???)

回答1:


The code would be something like this:

import boto3
import csv

# get a handle on s3
s3 = boto3.resource(u's3')

# get a handle on the bucket that holds your file
bucket = s3.Bucket(u'bucket-name')

# get a handle on the object you want (i.e. your file)
obj = bucket.Object(key=u'test.csv')

# get the object
response = obj.get()

# read the contents of the file and split it into a list of lines

# for python 2:
lines = response[u'Body'].read().split()

# for python 3 you need to decode the incoming bytes:
lines = response['Body'].read().decode('utf-8').split()

# now iterate over those lines
for row in csv.DictReader(lines):

    # here you get a sequence of dicts
    # do whatever you want with each line here
    print(row)

You can compact this a bit in actual code, but I tried to keep it step-by-step to show the object hierarchy with boto3.

Edit Per your comment about avoiding reading the entire file into memory: I haven't run into that requirement so cant speak authoritatively, but I would try wrapping the stream so I could get a text file-like iterator. For example you could use the codecs library to replace the csv parsing section above with something like:

for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])):
    print(row)


来源:https://stackoverflow.com/questions/42312196/how-do-i-read-a-csv-stored-in-s3-with-csv-dictreader

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!