I need to use the
(rdd.)partitionBy(npartitions, custom_partitioner)
method that is not available on the DataFrame. All of the DataFrame
@dapangmao's answer works, but it doesn't give the regular spark RDD, it returns a Row object. If you want to have the regular RDD format.
Try this:
rdd = df.rdd.map(tuple)
or
rdd = df.rdd.map(list)
Use the method .rdd
like this:
rdd = df.rdd
Answer given by kennyut/Kistian works very well but to get exact RDD like output when RDD consist of list of attributes e.g. [1,2,3,4] we can use flatmap command as below,
rdd = df.rdd.flatMap(list)
or
rdd = df.rdd.flatmap(lambda x: list(x))