Spark scala dataframe: Merging multiple columns into single column

前端 未结 1 2091
我寻月下人不归
我寻月下人不归 2021-01-06 12:11

I have a spark dataframe which looks something like below:

+---+------+----+
| id|animal|talk|
+---+------+----+
|  1|   bat|done|
|  2| mouse|mone|
|  3| ho         


        
相关标签:
1条回答
  • 2021-01-06 12:48

    Your expected output doesn't seem to reflect your requirement of producing a list of name-value structured objects. If I understand it correctly, consider using foldLeft to iteratively convert the wanted columns to StructType name-value columns, and group them into an ArrayType column:

    import org.apache.spark.sql.functions._
    
    val df = Seq(
      (1, "bat", "done"),
      (2, "mouse", "mone"),
      (3, "horse", "gun"),
      (4, "horse", "some")
    ).toDF("id", "animal", "talk")
    
    val cols = df.columns.filter(_ != "id")
    
    val resultDF = cols.
      foldLeft(df)( (accDF, c) => 
        accDF.withColumn(c, struct(lit(c).as("name"), col(c).as("value")))
      ).
      select($"id", array(cols.map(col): _*).as("merged"))
    
    resultDF.show(false)
    // +---+-----------------------------+
    // |id |merged                       |
    // +---+-----------------------------+
    // |1  |[[animal,bat], [talk,done]]  |
    // |2  |[[animal,mouse], [talk,mone]]|
    // |3  |[[animal,horse], [talk,gun]] |
    // |4  |[[animal,horse], [talk,some]]|
    // +---+-----------------------------+
    
    resultDF.printSchema
    // root
    //  |-- id: integer (nullable = false)
    //  |-- merged: array (nullable = false)
    //  |    |-- element: struct (containsNull = false)
    //  |    |    |-- name: string (nullable = false)
    //  |    |    |-- value: string (nullable = true)
    
    0 讨论(0)
提交回复
热议问题