Read Parquet file stored in S3 with AWS Lambda (Python 3)

前端未结

关注

 4  947

星月不相逢 2021-01-02 03:23

I am trying to load, process and write Parquet files in S3 with AWS Lambda. My testing / deployment process is:

https://github.com/lambci/docker-lambda as a

4条回答

野趣味 (楼主)

2021-01-02 04:15
I was able to accomplish writing parquet files into S3 using fastparquet. It's a little tricky but my breakthrough came when I realized that to put together all the dependencies, I had to use the same exact Linux that Lambda is using.

Here's how I did it:

1. Spin up a EC2 instance using the Amazon Linux image that is used with Lambda

Source: https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html

Linux image: https://console.aws.amazon.com/ec2/v2/home#Images:visibility=public-images;search=amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2

Note: you might need to install many packages and change python version to 3.6 as this Linux is not meant for development. Here's how I looked for packages:
```
sudo yum list | grep python3
```
I installed:
```
python36.x86_64
python36-devel.x86_64
python36-libs.x86_64
python36-pip.noarch
python36-setuptools.noarch
python36-tools.x86_64
```
2. Used the instructions from here to built a zip file with all of the dependencies that my script would use with dumping them all in a folder and the zipping them with this command:
```
mkdir parquet
cd parquet
pip install -t . fastparquet 
pip install -t . (any other dependencies)
copy my python file in this folder
zip and upload into Lambda
```
Note: there are some constraints I had to work around: Lambda doesn't let you upload zip larger 50M and unzipped > 260M. If anyone knows a better way to get dependencies into Lambda, please do share.

Source: Write parquet from AWS Kinesis firehose to AWS S3
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

Read Parquet file stored in S3 with AWS Lambda (Python 3)

1. Spin up a EC2 instance using the Amazon Linux image that is used with Lambda

2. Used the instructions from here to built a zip file with all of the dependencies that my script would use with dumping them all in a folder and the zipping them with this command: