Moving average in SQLite

强颜欢笑 提交于 2019-12-06 12:08:57

问题


I would like to compute a moving average over data in a SQLite table. I found several method in MySQL, but couldn't find an efficient one in SQLite.

In SQL, I think something like this should do it (however, I was not able to try it...) :

SELECT date, value, 
avg(value) OVER (ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as MovingAverageWindow7
FROM t ORDER BY date;

However, I see two drawbacks :

  • This does not seems to work on sqlite
  • If data are not continuous for few dates on preceding/following rows, it computes a moving average on a window which is wider than what I actually want since it is only based on the number of surrounding rows. Thus, a date condition should be added

Indeed, I would like it to compute the average of 'value' at each date, over +/-3 days (weekly moving average) or +/-15 days (monthly moving average)

Here is an example data set :

CREATE TABLE t ( date DATE, value INTEGER );

INSERT INTO t (date, value) VALUES ('2018-02-01', 8);
INSERT INTO t (date, value) VALUES ('2018-02-02', 2);
INSERT INTO t (date, value) VALUES ('2018-02-05', 5);
INSERT INTO t (date, value) VALUES ('2018-02-06', 4);
INSERT INTO t (date, value) VALUES ('2018-02-07', 1);
INSERT INTO t (date, value) VALUES ('2018-02-10', 6);
INSERT INTO t (date, value) VALUES ('2018-02-11', 0);
INSERT INTO t (date, value) VALUES ('2018-02-12', 2);
INSERT INTO t (date, value) VALUES ('2018-02-13', 1);
INSERT INTO t (date, value) VALUES ('2018-02-14', 3);
INSERT INTO t (date, value) VALUES ('2018-02-15', 11);
INSERT INTO t (date, value) VALUES ('2018-02-18', 4);
INSERT INTO t (date, value) VALUES ('2018-02-20', 1);
INSERT INTO t (date, value) VALUES ('2018-02-21', 5);
INSERT INTO t (date, value) VALUES ('2018-02-28', 10);
INSERT INTO t (date, value) VALUES ('2018-03-02', 6);
INSERT INTO t (date, value) VALUES ('2018-03-03', 7);
INSERT INTO t (date, value) VALUES ('2018-03-04', 3);
INSERT INTO t (date, value) VALUES ('2018-03-08', 5);
INSERT INTO t (date, value) VALUES ('2018-03-09', 6);
INSERT INTO t (date, value) VALUES ('2018-03-15', 1);
INSERT INTO t (date, value) VALUES ('2018-03-16', 3);
INSERT INTO t (date, value) VALUES ('2018-03-25', 5);
INSERT INTO t (date, value) VALUES ('2018-03-31', 1);

回答1:


I think I actually found a solution :

SELECT date, value, 
  (SELECT AVG(value) FROM t t2 
   WHERE datetime(t1.date, '-3 days') <= datetime(t2.date) AND datetime(t1.date, '+3 days') >= datetime(t2.date)
   ) AS MAVG
FROM t t1
GROUP BY strftime('%Y-%m-%d', date); 

I don't know if it is the most efficient way, but it seems to work

Edit : Applied to my real database containing 20 000 rows, a weekly moving average over two parameters takes approximately 1 minute to be calculated.

I see two options there :

  • There is a more efficient way to compute this with SQLite
  • I compute the moving average in Python after extracting data from SQLite



回答2:


One approach is to create a intermediate table that maps each date to the groups it belong to.

CREATE TABLE groups (date DATE, daygroup DATE);
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '-1 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '-2 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '-3 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '+1 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '+2 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '+3 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, date AS daygroup FROM t;

You get for example,

SELECT * FROM groups WHERE date = '2018-02-05'

    date        daygroup
    2018-02-05  2018-02-04
    2018-02-05  2018-02-03
    2018-02-05  2018-02-02
    2018-02-05  2018-02-06
    2018-02-05  2018-02-07
    2018-02-05  2018-02-08
    2018-02-05  2018-02-05

indicating that '2018-02-05' belongs to groups '2018-02-02' to '2018-02-08'. If a date belongs to a group, then the value of the data joins the calculation of moving average for the group.

With this, calculating the moving average becomes straightforward:

SELECT
  d.date, d.value, c.ma
FROM
  t AS d
INNER JOIN 
  (SELECT 
    b.daygroup,
    avg(a.value) AS ma
  FROM 
    t AS a 
  INNER JOIN
    groups AS b
  ON a.date = b.date
  GROUP BY b.daygroup) AS c
ON
  d.date = c.daygroup

Note that the number of rows of intermediate table is 7 times as large as that of original table, it grows proportionately as taking wider the window. This should be acceptable unless you have much larger table.

I also experimented with 20 000 rows. The insert query took 1.5s and select query took 0.5s on my laptop.

ADDED, perhaps better.

An alternative that does not require intermediate table. The query below merges the table with itself, in a way 3 days lag is allowed, then takes average.

SELECT
  t1.date, avg(t2.value) AS MVG
FROM 
  t AS t1
INNER JOIN
  t AS t2
ON
  datetime(t1.date, '-3 days') <= datetime(t2.date) 
  AND 
  datetime(t1.date, '+3 days') >= datetime(t2.date)
GROUP BY
  t1.date
;


来源:https://stackoverflow.com/questions/48488234/moving-average-in-sqlite

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!