Writing a file to S3 using Lambda in Python with AWS

我只是一个虾纸丫 提交于 2019-12-11 06:07:38

问题


In AWS, I'm trying to save a file to S3 in Python using a Lambda function. While this works on my local computer, I am unable to get it to work in Lambda. I've been working on this problem for most of the day and would appreciate help. Thank you.

def pdfToTable(PDFfilename, apiKey, fileExt, bucket, key):

    # parsing a PDF using an API
    fileData = (PDFfilename, open(PDFfilename, "rb"))
    files = {"f": fileData}
    postUrl = "https://pdftables.com/api?key={0}&format={1}".format(apiKey, fileExt)
    response = requests.post(postUrl, files=files)
    response.raise_for_status()

    # this code is probably the problem!
    s3 = boto3.resource('s3')
    bucket = s3.Bucket('transportation.manifests.parsed')
    with open('/tmp/output2.csv', 'rb') as data:
        data.write(response.content)
        key = 'csv/' + key
        bucket.upload_fileobj(data, key)

    # FYI, on my own computer, this saves the file
    with open('output.csv', "wb") as f:
        f.write(response.content)

In S3, there is a bucket transportation.manifests.parsed containing the folder csv where the file should be saved.

The type of response.content is bytes.

From AWS, the error from the current set-up above is [Errno 2] No such file or directory: '/tmp/output2.csv': FileNotFoundError. In fact, my goal is to save the file to the csv folder under a unique name, so tmp/output2.csv might not be the best approach. Any guidance?

In addition, I've tried to use wb and w instead of rb also to no avail. The error with wb is Input <_io.BufferedWriter name='/tmp/output2.csv'> of type: <class '_io.BufferedWriter'> is not supported. The documentation suggests that using 'rb' is the recommended usage, but I do not understand why that would be the case.

Also, I've tried s3_client.put_object(Key=key, Body=response.content, Bucket=bucket) but receive An error occurred (404) when calling the HeadObject operation: Not Found.


回答1:


You have a writable stream that you're asking boto3 to use as a readable stream which won't work.

Write the file, and then simply use bucket.upload_file() afterwards, like so:

s3 = boto3.resource('s3')
bucket = s3.Bucket('transportation.manifests.parsed')
with open('/tmp/output2.csv', 'w') as data:
    data.write(response.content)

key = 'csv/' + key
bucket.upload_file('/tmp/output2.csv', key)



回答2:


Assuming Python 3.6. The way I usually do this is to wrap the bytes content in a BytesIO wrapper to create a file like object. And, per the boto3 docs you can use the-transfer-manager for a managed transfer:

from io import BytesIO
import boto3
s3 = boto3.client('s3')

fileobj = BytesIO(response.content)

s3.upload_fileobj(fileobj, 'mybucket', 'mykey')

If that doesn't work I'd double check all IAM permissions are correct.



来源:https://stackoverflow.com/questions/49163099/writing-a-file-to-s3-using-lambda-in-python-with-aws

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!