How to check the number of partitions of a Spark DataFrame without incurring the cost of .rdd
There are a number of questions about how to obtain the number of partitions of a n RDD and or a DataFrame : the answers invariably are: rdd.getNumPartitions or df.rdd.getNumPartitions Unfortunately that is an expensive operation on a DataFrame because the df.rdd requires conversion from the DataFrame to an rdd . This is on the order of the time it takes to run df.count I am writing logic that optionally repartition 's or coalesce 's a DataFrame - based on whether the current number of partitions were within a range of acceptable values or instead below or above them. def repartition(inDf: