Adding previous row with current row using Window function

匆匆过客 提交于 2019-12-08 11:44:20

问题


I have a spark dataframe where, I want to calculate a running total based on current row Amount value and Previous row sum of Amount value based on groupid and id. Let me put out the df

import findspark
findspark.init()
import pyspark 
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
import pandas as pd


 sc = spark.sparkContext
data1 = {'date': {0: '2018-04-03', 1: '2018-04-04', 2: '2018-04-05', 3: '2018-04-06', 4: '2018-04-07'},
         'id': {0: 'id1', 1: 'id2', 2: 'id1', 3: 'id3', 4: 'id2'},
         'group': {0: '1', 1: '1', 2: '1', 3: '2', 4: '1'},
         'amount': {0: 50, 1: 40, 2: 50, 3: 55, 4: 20}}
df1_pd = pd.DataFrame(data1, columns=data1.keys())

df1 = spark.createDataFrame(df1_pd)
df1.show()


+----------+---+-----+------+
|      date| id|group|amount|
+----------+---+-----+------+
|2018-04-03|id1|    1|    50|
|2018-04-04|id2|    1|    40|
|2018-04-05|id1|    1|    50|
|2018-04-06|id3|    2|    55|
|2018-04-07|id2|    1|    20|
+----------+---+-----+------+

the out put I am looking for

+----------+---+-----+------+---+
|      date| id|group|amount|sum|
+----------+---+-----+------+---+
|2018-04-03|id1|    1|    50|50 |
|2018-04-04|id2|    1|    40|90 |
|2018-04-05|id1|    1|    50|140|
|2018-04-06|id3|    2|    55|55 |
|2018-04-07|id2|    1|    20|160|
+----------+---+-----+------+---+

回答1:


Window definition:

from pyspark.sql.window import Window
from pyspark.sql.functions import sum

w = Window.partitionBy("group").orderBy("date").rowsBetween(
    Window.unboundedPreceding,  # Take all rows from the beginning of frame
    Window.currentRow           # To current row
)

Sum:

(df1.withColumn("sum", sum("amount").over(w))
    .orderBy("date")   # Sort for easy inspection. Not necessary
    .show())

Result:

+----------+---+-----+------+---+      
|      date| id|group|amount|sum|
+----------+---+-----+------+---+
|2018-04-03|id1|    1|    50| 50|
|2018-04-04|id2|    1|    40| 90|
|2018-04-05|id1|    1|    50|140|
|2018-04-06|id3|    2|    55| 55|
|2018-04-07|id2|    1|    20|160|
+----------+---+-----+------+---+


来源:https://stackoverflow.com/questions/50144313/adding-previous-row-with-current-row-using-window-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!