问题
My downstream source does not support a Map type and my source does and as such sends this. I need to convert this map into an array of struct (tuple).
Scala support Map.toArray which creates an array of tuple for you which seems like the function I need on the Map to transform:
{
"a" : {
"b": {
"key1" : "value1",
"key2" : "value2"
},
"b_" : {
"array": [
{
"key": "key1",
"value" : "value1"
},
{
"key": "key2",
"value" : "value2"
}
]
}
}
}
What is the most efficient way in Spark to do this assuming that also the field to change is a nested one. e.g
a is the root level dataframe column
a.b is the map at level 1 (comes from the source)
a.b_ is the array type of struct (this is what I want to generate in converting a.b to the array)
The answer so far goes some of the way I think, just can get the withColumn and UDF suggested to generate as below.
Thanks!
回答1:
Just use an udf
:
val toArray = udf((vs: Map[String, String]) => vs.toArray)
and adjust input type according to your needs.
来源:https://stackoverflow.com/questions/43963273/spark-dataframe-generate-an-array-of-tuple-from-a-map-type