发表新帖

发表新帖

How to convert a DataFrame back to normal RDD in pyspark?

前端未结

关注

 3  830

青春惊慌失措

I need to use the

(rdd.)partitionBy(npartitions, custom_partitioner)

method that is not available on the DataFrame. All of the DataFrame

相关标签:

3条回答

被撕碎了的回忆

2020-12-12 19:31
@dapangmao's answer works, but it doesn't give the regular spark RDD, it returns a Row object. If you want to have the regular RDD format.

Try this:
```
rdd = df.rdd.map(tuple)
```
or
```
rdd = df.rdd.map(list)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-12-12 19:45
Use the method .rdd like this:
```
rdd = df.rdd
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2020-12-12 19:52
Answer given by kennyut/Kistian works very well but to get exact RDD like output when RDD consist of list of attributes e.g. [1,2,3,4] we can use flatmap command as below,
```
rdd = df.rdd.flatMap(list)
or 
rdd = df.rdd.flatmap(lambda x: list(x))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题