How to convert DataFrame to RDD in Scala?

Deadly 提交于 2019-12-20 09:27:04

问题


Can someone please share how one can convert a dataframe to an RDD?


回答1:


Simply:

val rows: RDD[Row] = df.rdd



回答2:


Use df.map(row => ...) to convert the dataframe to a RDD if you want to map a row to a different RDD element. For example

df.map(row => (row(1), row(2)))

gives you a paired RDD where the first column of the df is the key and the second column of the df is the value.




回答3:


I was just looking for my answer and found this post.

Jean's answer to absolutely correct,adding on that "df.rdd" will return a RDD[Rows]. I need to apply split() once i get RDD. For that we need to convert RDD[Row} to RDD[String]

val opt=spark.sql("select tags from cvs").map(x=>x.toString()).rdd


来源:https://stackoverflow.com/questions/32531224/how-to-convert-dataframe-to-rdd-in-scala

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!