Spark dataframe write method writing many small files

前端未结

关注

 6  1690

轮回少年 2020-11-27 17:34

I\'ve got a fairly simple job coverting log files to parquet. It\'s processing 1.1TB of data (chunked into 64MB - 128MB files - our block size is 128MB), which is approx 12

6条回答

陌清茗 (楼主)

2020-11-27 18:22
you have to repartiton your DataFrame to match the partitioning of the DataFrameWriter

Try this:
```
df
.repartition($"date")
.write.mode(SaveMode.Append)
.partitionBy("date")
.parquet(s"$path")
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...