问题
I would like to compute a moving average over data in a SQLite table. I found several method in MySQL, but couldn't find an efficient one in SQLite.
In SQL, I think something like this should do it (however, I was not able to try it...) :
SELECT date, value,
avg(value) OVER (ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as MovingAverageWindow7
FROM t ORDER BY date;
However, I see two drawbacks :
- This does not seems to work on sqlite
- If data are not continuous for few dates on preceding/following rows, it computes a moving average on a window which is wider than what I actually want since it is only based on the number of surrounding rows. Thus, a date condition should be added
Indeed, I would like it to compute the average of 'value' at each date, over +/-3 days (weekly moving average) or +/-15 days (monthly moving average)
Here is an example data set :
CREATE TABLE t ( date DATE, value INTEGER );
INSERT INTO t (date, value) VALUES ('2018-02-01', 8);
INSERT INTO t (date, value) VALUES ('2018-02-02', 2);
INSERT INTO t (date, value) VALUES ('2018-02-05', 5);
INSERT INTO t (date, value) VALUES ('2018-02-06', 4);
INSERT INTO t (date, value) VALUES ('2018-02-07', 1);
INSERT INTO t (date, value) VALUES ('2018-02-10', 6);
INSERT INTO t (date, value) VALUES ('2018-02-11', 0);
INSERT INTO t (date, value) VALUES ('2018-02-12', 2);
INSERT INTO t (date, value) VALUES ('2018-02-13', 1);
INSERT INTO t (date, value) VALUES ('2018-02-14', 3);
INSERT INTO t (date, value) VALUES ('2018-02-15', 11);
INSERT INTO t (date, value) VALUES ('2018-02-18', 4);
INSERT INTO t (date, value) VALUES ('2018-02-20', 1);
INSERT INTO t (date, value) VALUES ('2018-02-21', 5);
INSERT INTO t (date, value) VALUES ('2018-02-28', 10);
INSERT INTO t (date, value) VALUES ('2018-03-02', 6);
INSERT INTO t (date, value) VALUES ('2018-03-03', 7);
INSERT INTO t (date, value) VALUES ('2018-03-04', 3);
INSERT INTO t (date, value) VALUES ('2018-03-08', 5);
INSERT INTO t (date, value) VALUES ('2018-03-09', 6);
INSERT INTO t (date, value) VALUES ('2018-03-15', 1);
INSERT INTO t (date, value) VALUES ('2018-03-16', 3);
INSERT INTO t (date, value) VALUES ('2018-03-25', 5);
INSERT INTO t (date, value) VALUES ('2018-03-31', 1);
回答1:
I think I actually found a solution :
SELECT date, value,
(SELECT AVG(value) FROM t t2
WHERE datetime(t1.date, '-3 days') <= datetime(t2.date) AND datetime(t1.date, '+3 days') >= datetime(t2.date)
) AS MAVG
FROM t t1
GROUP BY strftime('%Y-%m-%d', date);
I don't know if it is the most efficient way, but it seems to work
Edit : Applied to my real database containing 20 000 rows, a weekly moving average over two parameters takes approximately 1 minute to be calculated.
I see two options there :
- There is a more efficient way to compute this with SQLite
- I compute the moving average in Python after extracting data from SQLite
回答2:
One approach is to create a intermediate table that maps each date to the groups it belong to.
CREATE TABLE groups (date DATE, daygroup DATE);
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-1 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-2 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-3 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+1 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+2 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+3 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, date AS daygroup FROM t;
You get for example,
SELECT * FROM groups WHERE date = '2018-02-05'
date daygroup
2018-02-05 2018-02-04
2018-02-05 2018-02-03
2018-02-05 2018-02-02
2018-02-05 2018-02-06
2018-02-05 2018-02-07
2018-02-05 2018-02-08
2018-02-05 2018-02-05
indicating that '2018-02-05' belongs to groups '2018-02-02' to '2018-02-08'. If a date belongs to a group, then the value of the data joins the calculation of moving average for the group.
With this, calculating the moving average becomes straightforward:
SELECT
d.date, d.value, c.ma
FROM
t AS d
INNER JOIN
(SELECT
b.daygroup,
avg(a.value) AS ma
FROM
t AS a
INNER JOIN
groups AS b
ON a.date = b.date
GROUP BY b.daygroup) AS c
ON
d.date = c.daygroup
Note that the number of rows of intermediate table is 7 times as large as that of original table, it grows proportionately as taking wider the window. This should be acceptable unless you have much larger table.
I also experimented with 20 000 rows. The insert query took 1.5s and select query took 0.5s on my laptop.
ADDED, perhaps better.
An alternative that does not require intermediate table. The query below merges the table with itself, in a way 3 days lag is allowed, then takes average.
SELECT
t1.date, avg(t2.value) AS MVG
FROM
t AS t1
INNER JOIN
t AS t2
ON
datetime(t1.date, '-3 days') <= datetime(t2.date)
AND
datetime(t1.date, '+3 days') >= datetime(t2.date)
GROUP BY
t1.date
;
来源:https://stackoverflow.com/questions/48488234/moving-average-in-sqlite