问题
The pyspark RDD documentation
http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD
does not show any method(s) to display partition information for an RDD.
Is there any way to get that information without executing an additional step e.g.:
myrdd.mapPartitions(lambda x: iter[1]).sum()
The above does work .. but seems like extra effort.
回答1:
I missed it: very simple:
rdd.getNumPartitions()
Not used to the java-ish getFooMethod() anymore ;)
Update : Adding in the comment from @dnlbrky :
dataFrame.rdd.getNumPartitions()
来源:https://stackoverflow.com/questions/29056079/show-partitions-on-a-pyspark-rdd