Convert Array of String column to multiple columns in spark scala

后端 未结 1 1962
慢半拍i
慢半拍i 2020-12-10 23:02

I have a dataframe with following schema:

id         : int,
emp_details: Array(String)

Some sample data:

1, Array(empname=x         


        
相关标签:
1条回答
  • 2020-12-10 23:42

    You can use withColumn and split to get the required data

    df1.withColumn("empname", split($"emp_details" (0), "=")(1))
      .withColumn("city", split($"emp_details" (1), "=")(1))
      .withColumn("zip", split($"emp_details" (2), "=")(1)) 
    

    Output:

    +---+----------------------------------+-------+----+-----+
    |id |emp_details                       |empname|city|zip  |
    +---+----------------------------------+-------+----+-----+
    |1  |[empname=xxx, city=yyy, zip=12345]|xxx    |yyy |12345|
    |2  |[empname=bbb, city=bbb, zip=22345]|bbb    |bbb |22345|
    +---+----------------------------------+-------+----+-----+
    

    UPDATE:
    If you don't have fixed sequence of data in array then you can use UDF to convert to map and use it as

    val getColumnsUDF = udf((details: Seq[String]) => {
      val detailsMap = details.map(_.split("=")).map(x => (x(0), x(1))).toMap
      (detailsMap("empname"), detailsMap("city"),detailsMap("zip"))
    })
    

    Now use the udf

    df1.withColumn("emp",getColumnsUDF($"emp_details"))
     .select($"id", $"emp._1".as("empname"), $"emp._2".as("city"), $"emp._3".as("zip"))
     .show(false)
    

    Output:

    +---+-------+----+---+
    |id |empname|city|zip|
    +---+-------+----+---+
    |1  |xxx    |xxx |xxx|
    |2  |bbb    |bbb |bbb|
    +---+-------+----+---+
    

    Hope this helps!

    0 讨论(0)
提交回复
热议问题