Convert an Spark dataframe columns with an array of JSON objects to multiple rows

后端未结

关注

 2  1249

甜味超标 2021-01-27 01:48

I have a streaming JSON data, whose structure can be described with the case class below

case class Hello(A: String, B: Array[Map[String, String]])

2条回答

清歌不尽 (楼主)

2021-01-27 02:14

Not sure if the best approach, but in a 2 step process it can be done. Leaving your case class aside, the following:

import org.apache.spark.sql.functions._
//case class ComponentPlacement(A: String, B: Array[Map[String, String]])
val df = Seq (
              ("ABC", List(Map("C" -> "1",  "D" -> "2"))),
              ("XYZ", List(Map("C" -> "11", "D" -> "22")))
             ).toDF("A", "B")

val df2 = df.select($"A", explode($"B")).toDF("A", "Bn")

val df3 = df2.select($"A", explode($"Bn")).toDF("A", "B", "C")

val df4 = df3.select($"A", $"B", $"C").groupBy("A").pivot("B").agg(first($"C"))

returns:

+---+---+---+
|  A|  C|  D|
+---+---+---+
|XYZ| 11| 22|
|ABC|  1|  2|
+---+---+---+

0 讨论(0)

查看其它2个回答