问题
I have a table with following two columns:
Initial Table
Date        Value
-------------------
2019.01.01 | 150  
2019.01.02 | 100  
2019.01.04 | 200  
2019.01.07 | 300  
2019.01.08 | 100  
2019.01.10 | 150  
2019.01.14 | 200  
2019.01.15 | 100  
For each row, I would like to sum values from the previous N number of days. In this case, N = 5.
Resultant Table
Date        Value  Sum
------------------------
2019.01.01 | 150 | 150 (01 -> ..)
2019.01.02 | 100 | 250 (02 -> 01)
2019.01.04 | 200 | 450 (04 -> 01)
2019.01.07 | 300 | 600 (07 -> 02)
2019.01.08 | 100 | 600 (08 -> 04)
2019.01.10 | 150 | 550 (10 -> 07)
2019.01.14 | 200 | 350 (14 -> 10)
2019.01.15 | 100 | 450 (15 -> 10)
Query
t:([] Date: 2019.01.01 2019.01.02 2019.01.04 2019.01.07 2019.01.08 2019.01.10 2019.01.14 2019.01.15; Value: 150 100 200 300 100 150 200 100)
How can I go about doing that?
回答1:
A window join is a pretty natural fit here. See: https://code.kx.com/v2/ref/wj/
q)wj1[-5 0+\:t`Date;`Date;t;(t;(sum;`Value))]
Date       Value
----------------
2019.01.01 150
2019.01.02 250
2019.01.04 450
2019.01.07 600
2019.01.08 600
2019.01.10 550
2019.01.14 350
2019.01.15 450
To go back 5 observations rather than 5 calendar days you could do:
q)wj1[{(4 xprev x;x)}t`Date;`Date;t;(t;(sum;`Value))]
Date       Value
----------------
2019.01.01 150
2019.01.02 250
2019.01.04 450
2019.01.07 750
2019.01.08 850
2019.01.10 850
2019.01.14 950
2019.01.15 850
回答2:
One way you could go about this is to use an update statement like below:
q)N:5
q)update Sum:sum each Value where each Date within/:flip(Date-N;Date)from t
Date       Value Sum
--------------------
2019.01.01 150   150
2019.01.02 100   250
2019.01.04 200   450
2019.01.07 300   600
2019.01.08 100   600
2019.01.10 150   550
2019.01.14 200   350
2019.01.15 100   450
The within keyword checks each date in the Date column is within the window of the current date and the current date-N, which is possible with an each right.
q)flip(-5+t`Date;t`Date)
2018.12.27 2019.01.01
2018.12.28 2019.01.02
2018.12.30 2019.01.04
2019.01.02 2019.01.07
2019.01.03 2019.01.08
2019.01.05 2019.01.10
2019.01.09 2019.01.14
2019.01.10 2019.01.15
q)t[`Date]within/:flip(-5+t`Date;t`Date)
10000000b
11000000b
11100000b
01110000b
00111000b
00011100b
00000110b
00000111b
This will return a list of boolean lists, which can be turned to indexes using where each (each since its a list of list), then indexed back into Value.
q)where each t[`Date]within/:flip(-5+t`Date;t`Date)
,0
0 1
0 1 2
1 2 3
2 3 4
3 4 5
5 6
5 6 7
q)t[`Value]where each t[`Date]within/:flip(-5+t`Date;t`Date)
,150
150 100
150 100 200
100 200 300
200 300 100
300 100 150
150 200
150 200 100
Then using sum each you can sum each of the list of numbers to get your desired result.
q)sum each t[`Value]where each t[`Date]within/:flip(-5+t`Date;t`Date)
150 250 450 600 600 550 350 450
回答3:
You could also achieve this using an update statement like the one below. It doesn't require the flip and so should execute faster.
q)N:5
q)delete s from update runningSum:s-0^s[Date bin neg[1]+Date-N] from update s:sums Value from t
Date       Value runningSum
---------------------------
2019.01.01 150   150
2019.01.02 100   250
2019.01.04 200   450
2019.01.07 300   600
2019.01.08 100   600
2019.01.10 150   550
2019.01.14 200   350
2019.01.15 100   450
This works using sums on the Value column, and then bin to find the running count from N days prior.
The delete keyword then removes the summed Value column to obtain your required result
q)\t:1000 delete s from update runningSum:s-0^s[Date bin neg[1]+Date-N] from update s:sums Value from t
7
While the time difference between this answer and Elliot's is negligible for small values of N, for larger values e.g. 1000, this is faster
q)\t:1000 update Sum:sum each Value where each Date within/:flip(Date-1000;Date)from t
11
q)\t:1000 delete s from update runningSum:s-0^s[Date bin neg[1]+Date-1000] from update s:sums Value from t
7
It should be noted that this answer requires the date field to be sorted, where Elliot's does not.
Another slightly slower way would is to generate 0 values for all the dates that is in between the min and max Date.
Then can use moving sums, msums, to get the values for the past 5 days.
It first takes the min and max Date from the table and makes a list of the dates that span between them.
q)update t: 0^Value from ([]Date:{[x]  x[0]+til 1+x[1]-x[0]} exec (min[Date], max Date) from t) lj `Date xkey t
Date       Value t
--------------------
2019.01.01 150   150
2019.01.02 100   100
2019.01.03       0
2019.01.04 200   200
2019.01.05       0
2019.01.06       0
2019.01.07 300   300
2019.01.08 100   100
2019.01.09       0
2019.01.10 150   150
Then it adds them to the table and fills in the empty values. This will then work for only the previous N days, taking into account any missing data
q){[x] select from x where not null Value } update t: 5 msum 0^Value from ([]Date:{[x]  x[0]+til 1+x[1]-x[0]} exec (min[Date], max Date) from t) lj `Date xkey t
Date       Value t
--------------------
2019.01.01 150   150
2019.01.02 100   250
2019.01.04 200   450
2019.01.07 300   500
2019.01.08 100   600
2019.01.10 150   550
2019.01.14 200   350
2019.01.15 100   300
I would also be careful when using Value as a column name, as you can run into issues with the value keyword
I hope this answers your question
回答4:
You can use the moving window mwin function to achieve this:
mwin:{[f;w;l] f each {1_x,y}\[w#0n;`float$l]}
You can then set the function f to sum and get the desired results over the last w:5 days for the desired list of values l (here l:exec Value from t):
update Sum:(mwin[sum;5;] exec Value from t) from t
Date       Value Sum
--------------------
2019.01.01 150   150
2019.01.02 100   250
2019.01.04 200   450
2019.01.07 300   750
2019.01.08 100   850
2019.01.10 150   850
2019.01.14 200   950
2019.01.15 100   850
来源:https://stackoverflow.com/questions/57034144/sum-values-from-the-previous-n-number-of-days-in-kdb