Is there any way to get the number of elements in a spark RDD partition, given the partition ID? Without scanning the entire partition.
Something like this:
The following gives you a new RDD with elements that are the sizes of each partition:
rdd.mapPartitions(iter => Array(iter.size).iterator, true)