apache-spark-1.3

How to get an Iterator of Rows using Dataframe in SparkSQL

走远了吗. 提交于 2019-11-29 10:22:00
I have an application in SparkSQL which returns large number of rows that are very difficult to fit in memory so I will not be able to use collect function on DataFrame, is there a way using which I can get all this rows as an Iterable instaed of the entire rows as list. Note: I am executing this SparkSQL application using yarn-client Generally speaking transferring all the data to the driver looks a pretty bad idea and most of the time there is a better solution out there but if you really want to go with this you can use toLocalIterator method on a RDD: val df: org.apache.spark.sql.DataFrame