How to create an AWS Lambda/API gateway python function that takes a pdf file as input using multipart/form-data?

徘徊边缘 提交于 2019-12-05 09:06:45

问题


I have been struggling with this for a while now. I need to create a resource in API gateway linking to a lambda function that takes a pdf file as input sent as a multipart/form-data POST request. To make it simple, I am just returning the file for now.

When I try to call the API with the following curl, I get Internal server error from AWS. Did anyone ever succeeded to send a pdf file to Lambda without having to use the S3 trick (upload to S3)?

Thank you all in advance for any hint.

Commands/Files:

curl

curl -vvv -X POST -H "Content-Type: multipart/form-data" -F "content=@file.pdf" https://...MYAPIHERE.../pdf

I am currently using serverless and python3.

Below are my files:

Servelerlss.yaml

function:
  pdf:
    handler: handler.pdf
    events:
      - http:
          path: /pdf
          method: post 
          integration: lambda
          request:
            template:
              application/json: "$input.json('$')"
          response:
            headers:
              Content-Type: "'aplication/json'"

handler.py

def pdf(event, context):
    pdf = event.get('content')
    out = {'statusCode': 200,
           'isBase64Encoded': False,
           'headers': {"content-type": "application/json"},
           'body': json.dumps({
               'input':  pdf,
               'inputType': 'url',
               #'tags': list(tags.keys()),
               'error': None})}
    return(out)

回答1:


I finally managed to solve this after a lot of google and with help of the AWS support team.

It turns out that API gateway checks the headers: "Content-Type" or "Accept" in the incoming request and matches it with the settings of Binary Media Type to decide which payload is considered as binary. That means we need to specify two content types (multipart/form-data, application/pdf) as Binary media type.

It is possible to do this using serveless by using serverless-apigw-binary and adding these to serverless.yaml:

plugins:
  - serverless-apigw-binary 

custom:
  apigwBinary:
    types:           #list of mime-types
      - 'multipart/form-data'
      - 'application/pdf'

But since lambda expects the payload in application/json format from the API gateway, the binary data cannot be passed directly. Therefore the settings for ContentHandling should be set to “CONVERT_TO_TEXT”. In the yaml file this translates into:

contentHandling: CONVERT_TO_TEXT

The final catch was solved by Kris Gohlson at serverless-thumbnail. Thank you for that Kris. I just wonder how did you come up with that...


Serverless.yaml

plugins:
  - serverless-apigw-binary 

custom:
  apigwBinary:
    types:           #list of mime-types
      - 'multipart/form-data'
      - 'application/pdf'

function:
  pdf:
    handler: handler.pdf
    events:
      - http:
          path: /pdf
          method: post 
          integration: lambda
          request:
            contentHandling: CONVERT_TO_TEXT
            passThrough: WHEN_NO_TEMPLATES
            template:
              application/pdf: "{'body': $input.json('$')}"
              multipart/form-data: "{'body': $input.json('$')}"
          response:
            contentHandling: CONVERT_TO_BINARY
            headers:
              Content-Type: "'aplication/json'"


来源:https://stackoverflow.com/questions/54610903/how-to-create-an-aws-lambda-api-gateway-python-function-that-takes-a-pdf-file-as

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!