Spark RDD mapping one row of data into multiple rows

吃可爱长大的小学妹 提交于 2019-12-22 10:59:58

问题


I have a text file with data that look like this:

Type1 1 3 5 9
Type2 4 6 7 8
Type3 3 6 9 10 11 25

I'd like to transform it into an RDD with rows like this:

1 Type1
3 Type1
3 Type3
......

I started with a case class:

MyData[uid : Int, gid : String]

New to spark and scala, and I can't seem to find an example that does this.


回答1:


It seems you want something like this?

rdd.flatMap(line=>{
  val splitLine = line.split(' ').toList
  splitLine match{
    case (gid:String) :: rest => rest.map(x:String =>MyData(x.toInt, gid))
  }
}


来源:https://stackoverflow.com/questions/31008169/spark-rdd-mapping-one-row-of-data-into-multiple-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!