问题
I have a table in Bigquery with data every 30 minutes, I want to show the data every 5 minutes, currently I am using this query to fill the null values with the existing values
SELECT
SETTLEMENTDATE,DUID,
LAST_VALUE(SCADAVALUE ignore nulls) OVER (
PARTITION BY DUID ORDER BY SETTLEMENTDATE) AS SCADAVALUE from x
instead, is it possible to do Linear interpolation, something like this
I have the column settlement date which is by 5 minutes, the column SCADAVALUEORIGIN Which has a value very 30 minutes, otherwise it is null, I want to add a column SCADAINTERPOLATION, which spread the values evenly between two the values of 30 minute, another issue is, as I refresh the data every 5 minutes, the last value will show null for (5,10,15,20,25) minutes, I hope, my explanation is clear
回答1:
Below is for BigQuery Standard SQL
#standardSQL
SELECT
TIMESTAMP_ADD(SETTLEMENTDATE, INTERVAL 5 * i MINUTE) AS SETTLEMENTDATE,
IF(i = 0, SCADAVALUEORIGIN, NULL) AS SCADAVALUEORIGIN,
SCADAVALUEORIGIN AS SCADAVALUE,
ROUND(SCADAVALUEORIGIN + IFNULL((next_value - SCADAVALUEORIGIN) / 6 * i, 0), 3) AS SCADAINTERPOLATION
FROM (
SELECT SETTLEMENTDATE, SCADAVALUEORIGIN,
LEAD(SCADAVALUEORIGIN) OVER(ORDER BY SETTLEMENTDATE) next_value,
FROM `project.dataset.table`
), UNNEST(GENERATE_ARRAY(0, 5)) i
if to apply to sample data from your question - result is
回答2:
I can speculate that you want something like this:
select timestamp_add(t.ts, interval min minute),
(val * (30 - min) +
lead(val) over (order by ts) * min
) / 30
from t cross join
unnest(generate_array(0, 25, 5)) min;
来源:https://stackoverflow.com/questions/60199557/bigquery-fill-missing-values-with-linear-interpolation