How to save a huge pandas dataframe to hdfs?

前端 未结 4 850
死守一世寂寞
死守一世寂寞 2020-12-05 08:55

Im working with pandas and with spark dataframes. The dataframes are always very big (> 20 GB) and the standard spark functions are not sufficient for those sizes. Currently

4条回答
  •  眼角桃花
    2020-12-05 09:06

    From https://issues.apache.org/jira/browse/SPARK-6235

    Support for parallelizing R data.frame larger than 2GB

    is resolved.

    From https://pandas.pydata.org/pandas-docs/stable/r_interface.html

    Converting DataFrames into R objects

    you can convert a pandas dataframe to an R data.frame

    So perhaps the transformation pandas -> R -> Spark -> hdfs?

提交回复
热议问题