Spark partitionBy much slower than without it
问题 I tested writing with: df.write.partitionBy("id", "name") .mode(SaveMode.Append) .parquet(filePath) However if I leave out the partitioning: df.write .mode(SaveMode.Append) .parquet(filePath) It executes 100x(!) faster. Is it normal for the same amount of data to take 100x longer to write when partitioning? There are 10 and 3000 unique id and name column values respectively. The DataFrame has 10 additional integer columns. 回答1: The first code snippet will write a parquet file per partition to