How to merge two columns of a `Dataframe` in Spark into one 2-Tuple?

后端 未结 4 1964
野趣味
野趣味 2020-12-14 22:14

I have a Spark DataFrame df with five columns. I want to add another column with its values being the tuple of the first and second columns. When u

4条回答
  •  执笔经年
    2020-12-14 23:10

    You can use struct function which creates a tuple of provided columns:

    import org.apache.spark.sql.functions.struct
    
    val df = Seq((1,2), (3,4), (5,3)).toDF("a", "b")
    df.withColumn("NewColumn", struct(df("a"), df("b")).show(false)
    
    +---+---+---------+
    |a  |b  |NewColumn|
    +---+---+---------+
    |1  |2  |[1,2]    |
    |3  |4  |[3,4]    |
    |5  |3  |[5,3]    |
    +---+---+---------+
    

提交回复
热议问题