How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?

后端 未结 7 1726
小蘑菇
小蘑菇 2020-12-04 09:15

I have a hacky way of achieving this using boto3 (1.4.4), pyarrow (0.4.1) and pandas (0.20.3).

First, I can read a single parq

相关标签:
7条回答
  • 2020-12-04 09:43

    It can be done using boto3 as well without the use of pyarrow

    import boto3
    import io
    import pandas as pd
    
    # Read the parquet file
    buffer = io.BytesIO()
    s3 = boto3.resource('s3')
    object = s3.Object('bucket_name','key')
    object.download_fileobj(buffer)
    df = pd.read_parquet(buffer)
    
    print(df.head())
    
    0 讨论(0)
提交回复
热议问题