How to read a Parquet file into Pandas DataFrame?

后端 未结 3 946
醉酒成梦
醉酒成梦 2020-12-07 18:03

How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only

3条回答
  •  我在风中等你
    2020-12-07 18:49

    pandas 0.21 introduces new functions for Parquet:

    pd.read_parquet('example_pa.parquet', engine='pyarrow')
    

    or

    pd.read_parquet('example_fp.parquet', engine='fastparquet')
    

    The above link explains:

    These engines are very similar and should read/write nearly identical parquet format files. These libraries differ by having different underlying dependencies (fastparquet by using numba, while pyarrow uses a c-library).

提交回复
热议问题