How to split multi-value column into separate rows using typed Dataset?

后端 未结 3 1948
無奈伤痛
無奈伤痛 2021-01-05 07:17

I am facing an issue of how to split a multi-value column, i.e. List[String], into separate rows.

The initial dataset has following types: Dataset

3条回答
  •  没有蜡笔的小新
    2021-01-05 07:48

    Here's one way to do it:

    val myRDD = sc.parallelize(Array(
      (0, "text0", 1.0, List("prp1", "prp2", "prp3")),
      (1, "text1", 2.0, List("prp4", "prp5", "prp6")),
      (2, "text2", 3.0, List("prp7", "prp8", "prp9"))
    )).map{
      case (i, t, v, ps) => ((i, t, v), ps)
    }.flatMapValues(x => x).map{
      case ((i, t, v), p) => (i, t, v, p)
    }
    

提交回复
热议问题