user_id | date | distance
1 | 2019-04-09 00:00:00 | 2
1 | 2019-04-09 00:00:30 | 5
1 | 2019-04-09 00:01:00 | 3
1 | 2019-04-09 00:01:45 | 7
1 | 2019-04-09 00:02:30 | 6
1 | 2019-04-09 00:03:00 | 1
How do I sum distance over next row until threshold point is reached and reset the counter again.
For instance if the threshold value is 10 I am trying to get the following output:
1 | 2019-04-09 00:00:00 | 2
1 | 2019-04-09 00:00:30 | 7 (2 + 5)
1 | 2019-04-09 00:01:00 | 10 ( 7 + 3 )
1 | 2019-04-09 00:01:45 | 7 RESET
1 | 2019-04-09 00:02:30 | 13 (7 + 6 )
1 | 2019-04-09 00:03:00 | 1 RESET
But all I could achieve is get cumulative distance with following query:
SELECT *, sum(distance) over (order by date asc) as running_distance FROM table;
I am using PostgreSQL.
Live test: http://sqlfiddle.com/#!17/16716/2
SELECT *, sum_with_reset(distance, 10) over (order by date asc) as running_distance
FROM tbl;
User-defined aggregate sum_with_reset definition:
create or replace function sum_reset_accum(
_accumulated numeric, _current numeric, _threshold numeric
)
returns numeric as
$$
select case when _accumulated >= _threshold then
_current
else
_current + _accumulated
end
$$ language sql;
create aggregate sum_with_reset(numeric, numeric)
(
sfunc = sum_reset_accum,
stype = numeric,
initcond = 0
);
Data
CREATE TABLE tbl
("user_id" int, "date" timestamp, "distance" int)
;
INSERT INTO tbl
("user_id", "date", "distance")
VALUES
(1, '2019-04-09 00:00:00', 2),
(1, '2019-04-09 00:00:30', 5),
(1, '2019-04-09 00:01:00', 3),
(1, '2019-04-09 00:01:45', 7),
(1, '2019-04-09 00:02:30', 6),
(1, '2019-04-09 00:03:00', 1)
;
Output:
| user_id | date | distance | running_distance |
|---------|----------------------|----------|------------------|
| 1 | 2019-04-09T00:00:00Z | 2 | 2 |
| 1 | 2019-04-09T00:00:30Z | 5 | 7 |
| 1 | 2019-04-09T00:01:00Z | 3 | 10 |
| 1 | 2019-04-09T00:01:45Z | 7 | 7 |
| 1 | 2019-04-09T00:02:30Z | 6 | 13 |
| 1 | 2019-04-09T00:03:00Z | 1 | 1 |
One-liner:
create or replace function sum_reset_accum(
_accumulated numeric, _current numeric, _threshold numeric
)
returns numeric as
$$
select _current + _accumulated * (_accumulated < _threshold)::int
$$ language 'sql';
Postgres boolean can cast true to 1, false to 0 by using cast operator ::int
.
You can use plpgsql
language too:
create or replace function sum_reset_accum(
_accumulated numeric, _current numeric, _threshold numeric
)
returns numeric as
$$begin
return _current + _accumulated * (_accumulated < _threshold)::int;
end$$ language 'plpgsql';
Note that you cannot create plpgsql function on sqlfiddle.com, so you cannot test that plpgsql code on sqlfiddle.com. You can, on your machine though.
As variant you can use RECURSIVE CTE
to get it
Test table:
CREATE TABLE public.table_b (
user_id int4 NULL,
"date" timestamp NULL,
distance int4 NULL
);
INSERT INTO public.table_b (user_id,"date",distance) VALUES
(1,'2019-04-09 00:00:00.000',2)
,(1,'2019-04-09 00:00:30.000',5)
,(1,'2019-04-09 00:01:00.000',3)
,(1,'2019-04-09 00:01:45.000',7)
,(1,'2019-04-09 00:02:30.000',6)
,(1,'2019-04-09 00:03:00.000',1)
,(2,'2019-04-09 00:00:00.000',2)
,(2,'2019-04-09 00:00:30.000',5)
,(2,'2019-04-09 00:01:00.000',3)
,(2,'2019-04-09 00:01:45.000',7)
,(2,'2019-04-09 00:02:30.000',6)
,(2,'2019-04-09 00:03:00.000',1);
Query:
WITH RECURSIVE cte1 AS(
SELECT *,ROW_NUMBER()OVER(PARTITION BY user_id ORDER BY date) n
FROM table_b
),
cte2 AS(
SELECT user_id,date,distance,distance running_distance,n
FROM cte1
WHERE n=1
UNION ALL
SELECT c1.user_id,c1.date,c1.distance,CASE WHEN c2.running_distance<10 THEN c2.running_distance ELSE 0 END+c1.distance running_distance,c1.n
FROM cte1 c1
JOIN cte2 c2 ON c2.user_id=c1.user_id AND c2.n+1=c1.n
)
SELECT user_id,date,distance,running_distance
FROM cte2
ORDER BY user_id,date
来源:https://stackoverflow.com/questions/55587556/sum-until-threshold-value-reached-and-then-reset-the-counter