How to read a Parquet file into Pandas DataFrame?

后端未结

关注

 3  945

醉酒成梦 2020-12-07 18:03

How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only

3条回答

陌清茗 (楼主)

2020-12-07 18:46

Update: since the time I answered this there has been a lot of work on this look at Apache Arrow for a better read and write of parquet. Also: http://wesmckinney.com/blog/python-parquet-multithreading/

There is a python parquet reader that works relatively well: https://github.com/jcrobak/parquet-python

It will create python objects and then you will have to move them to a Pandas DataFrame so the process will be slower than pd.read_csv for example.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...