Converting a Scala Iterable[tuple] to RDD

核能气质少年 提交于 2019-12-08 15:48:28

问题


I have a list of tuples, (String, String, Int, Double) that I want to convert to Spark RDD.

In general, how do I convert a Scala Iterable[(a1, a2, a3, ..., an)] into a Spark RDD?


回答1:


There are a few ways to do this, but the most straightforward way is just to use Spark Context:

import org.apache.spark._
import org.apache.spark.rdd._
import org.apache.spark.SparkContext._

sc.parallelize(YourIterable.toList)

I think sc.Parallelize needs a conversion to List, but it will preserve your structure, thus you will still get a RDD[String,String,Int,Double]



来源:https://stackoverflow.com/questions/33284507/converting-a-scala-iterabletuple-to-rdd

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!