问题
I have a python object which I know this is a parquet file loaded to the object. (I do not have the possibility to actually read it from a file).
The object var_1
contains b'PAR1\x15\x....1\x00PAR1
when I check the type:
type(var_1)
I get the result is bytes
Is there a way to read this ? say into a pandas data-frame ?
I have tried: 1)
from fastparquet import ParquetFile
pf = ParquetFile(var_1)
And got:
TypeError: a bytes-like object is required, not 'str'
2
import pyarrow.parquet as pq
dataset = pq.ParquetDataset(var_1)
and got:
TypeError: not a path-like object
Note, the solution to How to read a Parquet file into Pandas DataFrame?. i.e pd.read_parquet(var_1, engine='fastparquet')
results in TypeError: a bytes-like object is required, not 'str'
回答1:
You can do this by wrapping the bytes
object in an pyarrow.BufferReader
.
import pyarrow as pa
import pyarrow.parquet as pq
var_1 = …
reader = pa.BufferReader(var_1)
table = pq.read_table(reader)
df = table.to_pandas() # This results in a pandas.DataFrame
来源:https://stackoverflow.com/questions/58061225/how-to-read-a-parquet-bytes-object-in-python