How to read a parquet bytes object in python

蹲街弑〆低调 提交于 2019-12-11 06:20:05

问题


I have a python object which I know this is a parquet file loaded to the object. (I do not have the possibility to actually read it from a file).

The object var_1 contains b'PAR1\x15\x....1\x00PAR1

when I check the type:

type(var_1)

I get the result is bytes

Is there a way to read this ? say into a pandas data-frame ?

I have tried: 1)

from fastparquet import ParquetFile
pf = ParquetFile(var_1)

And got:

TypeError: a bytes-like object is required, not 'str'

2

import pyarrow.parquet as pq
dataset = pq.ParquetDataset(var_1)

and got:

TypeError: not a path-like object

Note, the solution to How to read a Parquet file into Pandas DataFrame?. i.e pd.read_parquet(var_1, engine='fastparquet') results in TypeError: a bytes-like object is required, not 'str'


回答1:


You can do this by wrapping the bytes object in an pyarrow.BufferReader.

import pyarrow as pa
import pyarrow.parquet as pq

var_1 = …    
reader = pa.BufferReader(var_1)
table = pq.read_table(reader)
df = table.to_pandas()  # This results in a pandas.DataFrame


来源:https://stackoverflow.com/questions/58061225/how-to-read-a-parquet-bytes-object-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!