How to check the number of partitions of a Spark DataFrame without incurring the cost of .rdd

后端 未结 2 757
名媛妹妹
名媛妹妹 2020-12-19 05:56

There are a number of questions about how to obtain the number of partitions of a n RDD and or a DataFrame : the answers invariably are:



        
2条回答
  •  醉话见心
    2020-12-19 06:50

    In my experience df.rdd.getNumPartitions is very fast, I never encountered taking this more than a second or so.

    Alternatively, you could also try

    val numPartitions: Long = df
          .select(org.apache.spark.sql.functions.spark_partition_id()).distinct().count()
    

    which would avoid using .rdd

提交回复
热议问题