How to save a huge pandas dataframe to hdfs?

前端 未结 4 856
死守一世寂寞
死守一世寂寞 2020-12-05 08:55

Im working with pandas and with spark dataframes. The dataframes are always very big (> 20 GB) and the standard spark functions are not sufficient for those sizes. Currently

4条回答
  •  不知归路
    2020-12-05 09:11

    An hack could be to create N pandas dataframes (each less than 2 GB) (horizontal partitioning) from the big one and create N different spark dataframes, then merging (Union) them to create a final one to write into HDFS. I am assuming that your master machine is powerful but you also have available a cluster in which you are running Spark.

提交回复
热议问题