发表新帖

发表新帖

How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?

后端未结

关注

 7  1749

小蘑菇 2020-12-04 09:15

I have a hacky way of achieving this using boto3 (1.4.4), pyarrow (0.4.1) and pandas (0.20.3).

First, I can read a single parq

7条回答

生来不讨喜 (楼主)

2020-12-04 09:40
Provided you have the right package setup
```
$ pip install pandas==1.1.0 pyarrow==1.0.0 s3fs==0.4.2
```
and your AWS shared config and credentials files configured appropriately

you can use pandas right away:
```
import pandas as pd

df = pd.read_parquet("s3://bucket/key.parquet")
```
In case of having multiple AWS profiles you may also need to set
```
$ export AWS_DEFAULT_PROFILE=profile_under_which_the_bucket_is_accessible
```
so you can access your bucket.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题