Can not access Pipelined Rdd in pyspark [duplicate]
问题 This question already has answers here : pyspark: 'PipelinedRDD' object is not iterable (2 answers) Closed last year . I am trying to implement K-means from scratch using pyspark. I am performing various operations on rdd's but when i try to display the result of the final processed rdd, some error like "Pipelined RDD's cant be iterated" or something like that and things like .collect() do not work again because of the piplined rdd issue. from __future__ import print_function import sys