Spark Dataframe: Generate an Array of Tuple from a Map type

南楼画角 提交于 2019-12-04 06:19:10

问题


My downstream source does not support a Map type and my source does and as such sends this. I need to convert this map into an array of struct (tuple).

Scala support Map.toArray which creates an array of tuple for you which seems like the function I need on the Map to transform:

{
  "a" : {
    "b": {
      "key1" : "value1",
      "key2" : "value2"
    },
    "b_" : {
      "array": [
        {
          "key": "key1",
          "value" : "value1"
        },
        {
          "key": "key2",
          "value" : "value2"
        }
      ]
    }
  }
}

What is the most efficient way in Spark to do this assuming that also the field to change is a nested one. e.g

a is the root level dataframe column

a.b is the map at level 1 (comes from the source)

a.b_ is the array type of struct (this is what I want to generate in converting a.b to the array)

The answer so far goes some of the way I think, just can get the withColumn and UDF suggested to generate as below.

Thanks!


回答1:


Just use an udf:

val toArray = udf((vs: Map[String, String]) => vs.toArray)

and adjust input type according to your needs.



来源:https://stackoverflow.com/questions/43963273/spark-dataframe-generate-an-array-of-tuple-from-a-map-type

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!