Convert RDD of Array(Row) to RDD of Row?

我只是一个虾纸丫 提交于 2019-12-06 11:44:40

You just need to flatten your RDD

yourRDD.flatMap(array => array)

Considering your code (some errors fixed, inside the inner map and in the assignation of id and str)

fileWithIdRDD.map(x => {
  val id = x._1
  val str = x._2
  val strArr = str.split("\\|")
  val rowArr = strArr.map(y => {
    Row(id, y)
  }) 
  rowArr 
}).flatMap(array => array)

Quick example here:

INPUT

fileWithIdRDD.collect
res30: Array[(Int, String)] = Array((0,aaa|bbb|ccc), (1,ddd|eee|fff|ggg))

EXECUTION

scala> fileWithIdRDD.map(x => {
      val id = x._1
      val str = x._2
      val strArr = str.split("\\|")
        val rowArr = strArr.map(y => {
          Row(id, y)
        })
      rowArr
      }).flatMap(array => array)


res31: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[17] at flatMap at <console>:35

OUTPUT

scala> res31.collect
res32: Array[org.apache.spark.sql.Row] = Array([0,aaa], [0,bbb], [0,ccc], [1,ddd], [1,eee], [1,fff], [1,ggg])
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!