I\'ve got a fairly simple job coverting log files to parquet. It\'s processing 1.1TB of data (chunked into 64MB - 128MB files - our block size is 128MB), which is approx 12
you have to repartiton your DataFrame to match the partitioning of the DataFrameWriter
DataFrame
DataFrameWriter
Try this:
df .repartition($"date") .write.mode(SaveMode.Append) .partitionBy("date") .parquet(s"$path")