问题
I have a data frame which is json column with json string. example below. There are 3 columns - a,b,c. Column c is stringType
| a | b | c |
--------------------------------------------------------
|77 |ABC | {"12549":38,"333513":39} |
|78 |ABC | {"12540":38,"333513":39} |
I want to make them into columns of the data frame(pivot). the example below -
| a | b | 12549 | 333513 | 12540
---------------------------------------------
|77 |ABC |38 |39 | null
|77 |ABC | null |39 | 38
回答1:
This may not be the most efficient, as it has to read all of the json
records an extra time to infer the schema. If you can statically define the schema, it should do better.
val data = spark.createDataset(Seq(
(77, "ABC", "{\"12549\":38,\"333513\":39}"),
(78, "ABC", "{\"12540\":38,\"333513\":39}")
)).toDF("a", "b", "c")
val schema = spark.read.json(data.select("c").as[String]).schema
data.select($"a", $"b", from_json($"c", schema).as("s")).select("a", "b", "s.*").show(false)
Result:
+---+---+-----+-----+------+
|a |b |12540|12549|333513|
+---+---+-----+-----+------+
|77 |ABC|null |38 |39 |
|78 |ABC|38 |null |39 |
+---+---+-----+-----+------+
来源:https://stackoverflow.com/questions/55337552/how-to-parse-json-column-in-dataframe-in-scala