How to save a huge pandas dataframe to hdfs?

前端未结

关注

 4  850

死守一世寂寞 2020-12-05 08:55

Im working with pandas and with spark dataframes. The dataframes are always very big (> 20 GB) and the standard spark functions are not sufficient for those sizes. Currently

4条回答

眼角桃花 (楼主)

2020-12-05 09:06

From https://issues.apache.org/jira/browse/SPARK-6235

Support for parallelizing R data.frame larger than 2GB

is resolved.

From https://pandas.pydata.org/pandas-docs/stable/r_interface.html

Converting DataFrames into R objects

you can convert a pandas dataframe to an R data.frame

So perhaps the transformation pandas -> R -> Spark -> hdfs?

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...