I\'ve got a fairly simple job coverting log files to parquet. It\'s processing 1.1TB of data (chunked into 64MB - 128MB files - our block size is 128MB), which is approx 12
I came across the same issue and I could using coalesce solved my problem.
coalesce
df .coalesce(3) // number of parts/files .write.mode(SaveMode.Append) .parquet(s"$path")
For more information on using coalesce or repartition you can refer to the following spark: coalesce or repartition
repartition