How to save a huge pandas dataframe to hdfs?

前端未结

关注

 4  856

死守一世寂寞 2020-12-05 08:55

Im working with pandas and with spark dataframes. The dataframes are always very big (> 20 GB) and the standard spark functions are not sufficient for those sizes. Currently

4条回答

不知归路 (楼主)

2020-12-05 09:11

An hack could be to create N pandas dataframes (each less than 2 GB) (horizontal partitioning) from the big one and create N different spark dataframes, then merging (Union) them to create a final one to write into HDFS. I am assuming that your master machine is powerful but you also have available a cluster in which you are running Spark.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...