Cross read parquet files between R and Python

自闭症网瘾萝莉.ら 提交于 2019-12-11 17:06:23

问题


We have generated a parquet files, one in Dask (Python) and another with R Drill (using the Sergeant packet ). They use a different implementations of parquet see my other parquet question

We are not able to cross read the files (the python can't read the R file and vice versa).
When reading the Python parquet file in the R environment we receive the following error: system error: Illegalstatexception: UTF8 can only annotate binary filed .
When reading the R/Drill parquet file in Dask we get an FileNotFoundError: [Error 2] no such file or directory ...\_metadata (which is self explanatory).
What are the options to cross read parquet files between R and Python?

Any insights would be appreciated.


回答1:


To read drill-like parquet data-sets with fastparquet/dask, you need to pas a list of the filenames, e.g.,

files = glob.glob('mydata/*/*.parquet')
df = dd.read_parquet(files)

The error from going in the other direction might be a bug, or (gathering from your other question), may indicate that you used fixed-length strings, but drill/R doesn't support them.



来源:https://stackoverflow.com/questions/45433607/cross-read-parquet-files-between-r-and-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!