Show partitions on a pyspark RDD

家住魔仙堡 提交于 2020-01-22 17:42:50

问题


The pyspark RDD documentation

http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD

does not show any method(s) to display partition information for an RDD.

Is there any way to get that information without executing an additional step e.g.:

myrdd.mapPartitions(lambda x: iter[1]).sum()

The above does work .. but seems like extra effort.


回答1:


I missed it: very simple:

rdd.getNumPartitions()

Not used to the java-ish getFooMethod() anymore ;)

Update : Adding in the comment from @dnlbrky :

dataFrame.rdd.getNumPartitions()


来源:https://stackoverflow.com/questions/29056079/show-partitions-on-a-pyspark-rdd

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!