How to convert a DataFrame back to normal RDD in pyspark?

前端 未结 3 825
青春惊慌失措
青春惊慌失措 2020-12-12 19:00

I need to use the

(rdd.)partitionBy(npartitions, custom_partitioner)

method that is not available on the DataFrame. All of the DataFrame

相关标签:
3条回答
  • 2020-12-12 19:31

    @dapangmao's answer works, but it doesn't give the regular spark RDD, it returns a Row object. If you want to have the regular RDD format.

    Try this:

    rdd = df.rdd.map(tuple)
    

    or

    rdd = df.rdd.map(list)
    
    0 讨论(0)
  • 2020-12-12 19:45

    Use the method .rdd like this:

    rdd = df.rdd
    
    0 讨论(0)
  • 2020-12-12 19:52

    Answer given by kennyut/Kistian works very well but to get exact RDD like output when RDD consist of list of attributes e.g. [1,2,3,4] we can use flatmap command as below,

    rdd = df.rdd.flatMap(list)
    or 
    rdd = df.rdd.flatmap(lambda x: list(x))
    
    0 讨论(0)
提交回复
热议问题