How to convert an Iterable to an RDD

戏子无情 提交于 2021-02-07 10:45:26

问题


To be more specific, how can i convert a scala.Iterable to a org.apache.spark.rdd.RDD ?

I have an RDD of (String, Iterable[(String, Integer)]) and i want this to be converted into an RDD of (String, RDD[String, Integer]), so that i can apply a reduceByKey function to the internal RDD.

e.g i have an RDD where key is 2-lettered prefix of a person's name and the value is List of pairs of Person name and hours that they spent in an event

my RDD is :

("To", List(("Tom",50),("Tod","30"),("Tom",70),("Tod","25"),("Tod",15)) ("Ja", List(("Jack",50),("James","30"),("Jane",70),("James","25"),("Jasper",15))

i need the List to be converted to RDD so that i can use accumulate each person's total hours spent. Applying reduceByKey and make the result as ("To", RDD(("Tom",120),("Tod","70")) ("Ja", RDD(("Jack",120),("James","55"),("Jane",15))

But i counldn't find any such transformation function. How can i do this ?

Thanks in advance.


回答1:


You can achieve this by using a flatMap and reduceByKey. Something like this:

rdd.flatMap{case(key, list) => list.map(item => ((key,item._1), item._2))}
   .reduceByKey(_+_)
   .map{case((key,name),hours) => (key, List((name, hours)))}
   .reduceByKey(_++_)


来源:https://stackoverflow.com/questions/37208871/how-to-convert-an-iterable-to-an-rdd

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!