How to get an Iterator of Rows using Dataframe in SparkSQL

后端未结

关注

 2  805

自闭症患者 2020-12-19 05:55

I have an application in SparkSQL which returns large number of rows that are very difficult to fit in memory so I will not be able to use collect function on DataFrame, is

2条回答

庸人自扰 (楼主)

2020-12-19 06:31
Generally speaking transferring all the data to the driver looks a pretty bad idea and most of the time there is a better solution out there but if you really want to go with this you can use toLocalIterator method on a RDD:
```
val df: org.apache.spark.sql.DataFrame = ???
df.cache // Optional, to avoid repeated computation, see docs for details
val iter: Iterator[org.apache.spark.sql.Row]  = df.rdd.toLocalIterator 
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...