BigQuery, fill missing values with Linear interpolation

情到浓时终转凉″ 提交于 2021-02-08 11:25:24

问题


I have a table in Bigquery with data every 30 minutes, I want to show the data every 5 minutes, currently I am using this query to fill the null values with the existing values

SELECT
SETTLEMENTDATE,DUID,
LAST_VALUE(SCADAVALUE ignore nulls) OVER (
    PARTITION BY DUID ORDER BY SETTLEMENTDATE) AS SCADAVALUE from x

instead, is it possible to do Linear interpolation, something like this

I have the column settlement date which is by 5 minutes, the column SCADAVALUEORIGIN Which has a value very 30 minutes, otherwise it is null, I want to add a column SCADAINTERPOLATION, which spread the values evenly between two the values of 30 minute, another issue is, as I refresh the data every 5 minutes, the last value will show null for (5,10,15,20,25) minutes, I hope, my explanation is clear


回答1:


Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  TIMESTAMP_ADD(SETTLEMENTDATE, INTERVAL 5 * i MINUTE) AS SETTLEMENTDATE, 
  IF(i = 0, SCADAVALUEORIGIN, NULL) AS SCADAVALUEORIGIN,
  SCADAVALUEORIGIN AS SCADAVALUE,
  ROUND(SCADAVALUEORIGIN + IFNULL((next_value - SCADAVALUEORIGIN) / 6 * i, 0), 3) AS SCADAINTERPOLATION
FROM (
  SELECT SETTLEMENTDATE, SCADAVALUEORIGIN, 
    LEAD(SCADAVALUEORIGIN) OVER(ORDER BY SETTLEMENTDATE) next_value,
  FROM `project.dataset.table`
), UNNEST(GENERATE_ARRAY(0, 5)) i 

if to apply to sample data from your question - result is




回答2:


I can speculate that you want something like this:

select timestamp_add(t.ts, interval min minute),
       (val * (30 - min) +
        lead(val) over (order by ts) * min
       ) / 30
from t cross join
     unnest(generate_array(0, 25, 5)) min;


来源:https://stackoverflow.com/questions/60199557/bigquery-fill-missing-values-with-linear-interpolation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!