Spark dataframe write method writing many small files

前端未结

关注

 6  1691

轮回少年 2020-11-27 17:34

I\'ve got a fairly simple job coverting log files to parquet. It\'s processing 1.1TB of data (chunked into 64MB - 128MB files - our block size is 128MB), which is approx 12

6条回答

被撕碎了的回忆 (楼主)

2020-11-27 18:24
I came across the same issue and I could using coalesce solved my problem.
```
df
  .coalesce(3) // number of parts/files 
  .write.mode(SaveMode.Append)
  .parquet(s"$path")
```
For more information on using coalesce or repartition you can refer to the following spark: coalesce or repartition
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...