发表新帖

发表新帖

How to check the number of partitions of a Spark DataFrame without incurring the cost of .rdd

后端未结

关注

 2  757

名媛妹妹 2020-12-19 05:56

There are a number of questions about how to obtain the number of partitions of a n RDD and or a DataFrame : the answers invariably are:

2条回答

醉话见心 (楼主)

2020-12-19 06:50
In my experience df.rdd.getNumPartitions is very fast, I never encountered taking this more than a second or so.

Alternatively, you could also try
```
val numPartitions: Long = df
      .select(org.apache.spark.sql.functions.spark_partition_id()).distinct().count()
```
which would avoid using .rdd
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题