问题
I have a table with following two columns:
Initial Table
Date Value
-------------------
2019.01.01 | 150
2019.01.02 | 100
2019.01.04 | 200
2019.01.07 | 300
2019.01.08 | 100
2019.01.10 | 150
2019.01.14 | 200
2019.01.15 | 100
For each row, I would like to sum values from the previous N
number of days. In this case, N
= 5.
Resultant Table
Date Value Sum
------------------------
2019.01.01 | 150 | 150 (01 -> ..)
2019.01.02 | 100 | 250 (02 -> 01)
2019.01.04 | 200 | 450 (04 -> 01)
2019.01.07 | 300 | 600 (07 -> 02)
2019.01.08 | 100 | 600 (08 -> 04)
2019.01.10 | 150 | 550 (10 -> 07)
2019.01.14 | 200 | 350 (14 -> 10)
2019.01.15 | 100 | 450 (15 -> 10)
Query
t:([] Date: 2019.01.01 2019.01.02 2019.01.04 2019.01.07 2019.01.08 2019.01.10 2019.01.14 2019.01.15; Value: 150 100 200 300 100 150 200 100)
How can I go about doing that?
回答1:
A window join is a pretty natural fit here. See: https://code.kx.com/v2/ref/wj/
q)wj1[-5 0+\:t`Date;`Date;t;(t;(sum;`Value))]
Date Value
----------------
2019.01.01 150
2019.01.02 250
2019.01.04 450
2019.01.07 600
2019.01.08 600
2019.01.10 550
2019.01.14 350
2019.01.15 450
To go back 5 observations rather than 5 calendar days you could do:
q)wj1[{(4 xprev x;x)}t`Date;`Date;t;(t;(sum;`Value))]
Date Value
----------------
2019.01.01 150
2019.01.02 250
2019.01.04 450
2019.01.07 750
2019.01.08 850
2019.01.10 850
2019.01.14 950
2019.01.15 850
回答2:
One way you could go about this is to use an update statement like below:
q)N:5
q)update Sum:sum each Value where each Date within/:flip(Date-N;Date)from t
Date Value Sum
--------------------
2019.01.01 150 150
2019.01.02 100 250
2019.01.04 200 450
2019.01.07 300 600
2019.01.08 100 600
2019.01.10 150 550
2019.01.14 200 350
2019.01.15 100 450
The within keyword checks each date in the Date column is within the window of the current date and the current date-N, which is possible with an each right.
q)flip(-5+t`Date;t`Date)
2018.12.27 2019.01.01
2018.12.28 2019.01.02
2018.12.30 2019.01.04
2019.01.02 2019.01.07
2019.01.03 2019.01.08
2019.01.05 2019.01.10
2019.01.09 2019.01.14
2019.01.10 2019.01.15
q)t[`Date]within/:flip(-5+t`Date;t`Date)
10000000b
11000000b
11100000b
01110000b
00111000b
00011100b
00000110b
00000111b
This will return a list of boolean lists, which can be turned to indexes using where each
(each since its a list of list), then indexed back into Value.
q)where each t[`Date]within/:flip(-5+t`Date;t`Date)
,0
0 1
0 1 2
1 2 3
2 3 4
3 4 5
5 6
5 6 7
q)t[`Value]where each t[`Date]within/:flip(-5+t`Date;t`Date)
,150
150 100
150 100 200
100 200 300
200 300 100
300 100 150
150 200
150 200 100
Then using sum each
you can sum each of the list of numbers to get your desired result.
q)sum each t[`Value]where each t[`Date]within/:flip(-5+t`Date;t`Date)
150 250 450 600 600 550 350 450
回答3:
You could also achieve this using an update statement like the one below. It doesn't require the flip and so should execute faster.
q)N:5
q)delete s from update runningSum:s-0^s[Date bin neg[1]+Date-N] from update s:sums Value from t
Date Value runningSum
---------------------------
2019.01.01 150 150
2019.01.02 100 250
2019.01.04 200 450
2019.01.07 300 600
2019.01.08 100 600
2019.01.10 150 550
2019.01.14 200 350
2019.01.15 100 450
This works using sums
on the Value column, and then bin
to find the running count from N days prior.
The delete
keyword then removes the summed Value column to obtain your required result
q)\t:1000 delete s from update runningSum:s-0^s[Date bin neg[1]+Date-N] from update s:sums Value from t
7
While the time difference between this answer and Elliot's is negligible for small values of N, for larger values e.g. 1000, this is faster
q)\t:1000 update Sum:sum each Value where each Date within/:flip(Date-1000;Date)from t
11
q)\t:1000 delete s from update runningSum:s-0^s[Date bin neg[1]+Date-1000] from update s:sums Value from t
7
It should be noted that this answer requires the date field to be sorted, where Elliot's does not.
Another slightly slower way would is to generate 0 values for all the dates that is in between the min and max Date.
Then can use moving sums, msums
, to get the values for the past 5 days.
It first takes the min
and max
Date from the table and makes a list of the dates that span between them.
q)update t: 0^Value from ([]Date:{[x] x[0]+til 1+x[1]-x[0]} exec (min[Date], max Date) from t) lj `Date xkey t
Date Value t
--------------------
2019.01.01 150 150
2019.01.02 100 100
2019.01.03 0
2019.01.04 200 200
2019.01.05 0
2019.01.06 0
2019.01.07 300 300
2019.01.08 100 100
2019.01.09 0
2019.01.10 150 150
Then it adds them to the table and fills in the empty values. This will then work for only the previous N days, taking into account any missing data
q){[x] select from x where not null Value } update t: 5 msum 0^Value from ([]Date:{[x] x[0]+til 1+x[1]-x[0]} exec (min[Date], max Date) from t) lj `Date xkey t
Date Value t
--------------------
2019.01.01 150 150
2019.01.02 100 250
2019.01.04 200 450
2019.01.07 300 500
2019.01.08 100 600
2019.01.10 150 550
2019.01.14 200 350
2019.01.15 100 300
I would also be careful when using Value as a column name, as you can run into issues with the value
keyword
I hope this answers your question
回答4:
You can use the moving window mwin
function to achieve this:
mwin:{[f;w;l] f each {1_x,y}\[w#0n;`float$l]}
You can then set the function f
to sum
and get the desired results over the last w:5
days for the desired list of values l
(here l:exec Value from t
):
update Sum:(mwin[sum;5;] exec Value from t) from t
Date Value Sum
--------------------
2019.01.01 150 150
2019.01.02 100 250
2019.01.04 200 450
2019.01.07 300 750
2019.01.08 100 850
2019.01.10 150 850
2019.01.14 200 950
2019.01.15 100 850
来源:https://stackoverflow.com/questions/57034144/sum-values-from-the-previous-n-number-of-days-in-kdb