How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?

后端未结

关注

 7  1738

小蘑菇

I have a hacky way of achieving this using boto3 (1.4.4), pyarrow (0.4.1) and pandas (0.20.3).

First, I can read a single parq

相关标签:

7条回答

夕颜

2020-12-04 09:43

It can be done using boto3 as well without the use of pyarrow

import boto3
import io
import pandas as pd

# Read the parquet file
buffer = io.BytesIO()
s3 = boto3.resource('s3')
object = s3.Object('bucket_name','key')
object.download_fileobj(buffer)
df = pd.read_parquet(buffer)

print(df.head())

0 讨论(0)

上一页 1 2