According to Learning Spark
Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of
Repartition: Shuffle the data into a NEW number of partitions.
Eg. Initial data frame is partitioned in 200 partitions.
df.repartition(500): Data will be shuffled from 200 partitions to new 500 partitions.
Coalesce: Shuffle the data into existing number of partitions.
df.coalesce(5): Data will be shuffled from remaining 195 partitions to 5 existing partitions.