Spark add new column to dataframe with value from previous row

前端 未结 2 941
星月不相逢
星月不相逢 2020-11-29 04:26

I\'m wondering how I can achieve the following in Spark (Pyspark)

Initial Dataframe:

+--+---+
|id|num|
+--+---+
|4 |9.0|
+--+---+
|3 |7.0|
+--+---+
|         


        
2条回答
  •  情书的邮戳
    2020-11-29 05:03

       val df = sc.parallelize(Seq((4, 9.0), (3, 7.0), (2, 3.0), (1, 5.0))).toDF("id", "num")
    df.show
    +---+---+
    | id|num|
    +---+---+
    |  4|9.0|
    |  3|7.0|
    |  2|3.0|
    |  1|5.0|
    +---+---+
    df.withColumn("new_column", lag("num", 1, 0).over(w)).show
    +---+---+----------+
    | id|num|new_column|
    +---+---+----------+
    |  1|5.0|       0.0|
    |  2|3.0|       5.0|
    |  3|7.0|       3.0|
    |  4|9.0|       7.0|
    +---+---+----------+
    

提交回复
热议问题