I am facing an issue of how to split a multi-value column, i.e. List[String]
, into separate rows.
The initial dataset has following types: Dataset
Here's one way to do it:
val myRDD = sc.parallelize(Array(
(0, "text0", 1.0, List("prp1", "prp2", "prp3")),
(1, "text1", 2.0, List("prp4", "prp5", "prp6")),
(2, "text2", 3.0, List("prp7", "prp8", "prp9"))
)).map{
case (i, t, v, ps) => ((i, t, v), ps)
}.flatMapValues(x => x).map{
case ((i, t, v), p) => (i, t, v, p)
}