window-functions

Retrieve last known value for each column of a row

拈花ヽ惹草 提交于 2019-12-01 23:48:36
问题 Not sure about the correct words to ask this question, so I will break it down. I have a table as follows: date_time | a | b | c Last 4 rows: 15/10/2013 11:45:00 | null | 'timtim' | 'fred' 15/10/2013 13:00:00 | 'tune' | 'reco' | null 16/10/2013 12:00:00 | 'abc' | null | null 16/10/2013 13:00:00 | null | 'died' | null How would I get the last record but with the value ignoring the null and instead get the value from the previous record. In my provided example the row returned would be 16/10

lag to get first non null value since the previous null value

感情迁移 提交于 2019-12-01 23:28:37
Below is an example of what I'm trying to achieve in a Redshift Database. I have a variable current_value and I want to create a new column value_desired that is: the same as current_value if the previous row is null equal to the last preceding non-null value if the previous row is non-null It sounds like an easy task but I haven't found a way to do it yet. row_numb current_value value_desired 1 2 3 47 47 4 5 45 45 6 7 8 42 42 9 41 42 10 40 42 11 39 42 12 38 42 13 14 36 36 15 16 17 33 33 18 32 33 I've tried with the LAG() function but I can only get the previous value (not the first in the

Multiple averages over evenly spaced intervals

為{幸葍}努か 提交于 2019-12-01 20:48:54
I'm trying to learn SQL so be patient with me. I'm using PostgreSQL 9.3 I want to average a column based on a window of dates. I've been able to write window functions that accomplish this with a set interval but I'd like to be able to be able to do this with a growing interval . By this I mean: average values from date_0 to date_1 average values from date_0 to date_2 average values from date_0 to date_3 ..... so date date_0 stays the same and date_x grows and creates a larger sample I'm assuming there is a better way than running a query for each range I'd like to average. Any advice is

What's the default window frame for window functions

心不动则不痛 提交于 2019-12-01 18:30:31
Running the following code: val sales = Seq( (0, 0, 0, 5), (1, 0, 1, 3), (2, 0, 2, 1), (3, 1, 0, 2), (4, 2, 0, 8), (5, 2, 2, 8)) .toDF("id", "orderID", "prodID", "orderQty") val orderedByID = Window.orderBy('id) val totalQty = sum('orderQty).over(orderedByID).as('running_total) val salesTotalQty = sales.select('*, totalQty).orderBy('id) salesTotalQty.show The result is: +---+-------+------+--------+-------------+ | id|orderID|prodID|orderQty|running_total| +---+-------+------+--------+-------------+ | 0| 0| 0| 5| 5| | 1| 0| 1| 3| 8| | 2| 0| 2| 1| 9| | 3| 1| 0| 2| 11| | 4| 2| 0| 8| 19| | 5| 2|

Sum until threshold value reached and then reset the counter

社会主义新天地 提交于 2019-12-01 14:32:39
user_id | date | distance 1 | 2019-04-09 00:00:00 | 2 1 | 2019-04-09 00:00:30 | 5 1 | 2019-04-09 00:01:00 | 3 1 | 2019-04-09 00:01:45 | 7 1 | 2019-04-09 00:02:30 | 6 1 | 2019-04-09 00:03:00 | 1 How do I sum distance over next row until threshold point is reached and reset the counter again. For instance if the threshold value is 10 I am trying to get the following output: 1 | 2019-04-09 00:00:00 | 2 1 | 2019-04-09 00:00:30 | 7 (2 + 5) 1 | 2019-04-09 00:01:00 | 10 ( 7 + 3 ) 1 | 2019-04-09 00:01:45 | 7 RESET 1 | 2019-04-09 00:02:30 | 13 (7 + 6 ) 1 | 2019-04-09 00:03:00 | 1 RESET But all I could

Create a group id over a window in Spark Dataframe

爱⌒轻易说出口 提交于 2019-12-01 14:10:44
I have a dataframe where I want to give id's in each Window partition. For example I have id | col | 1 | a | 2 | a | 3 | b | 4 | c | 5 | c | So I want (based on grouping with column col) id | group | 1 | 1 | 2 | 1 | 3 | 2 | 4 | 3 | 5 | 3 | I want to use a window function but I cannot find anyway to assign an Id to each window. I need something like: w = Window().partitionBy('col') df = df.withColumn("group", id().over(w)) Is there any way to achive somethong like that. (I cannot simply use col as a group id because I am interested in creating a window over multiple columns) Simply using a

Sum until threshold value reached and then reset the counter

守給你的承諾、 提交于 2019-12-01 13:09:20
问题 user_id | date | distance 1 | 2019-04-09 00:00:00 | 2 1 | 2019-04-09 00:00:30 | 5 1 | 2019-04-09 00:01:00 | 3 1 | 2019-04-09 00:01:45 | 7 1 | 2019-04-09 00:02:30 | 6 1 | 2019-04-09 00:03:00 | 1 How do I sum distance over next row until threshold point is reached and reset the counter again. For instance if the threshold value is 10 I am trying to get the following output: 1 | 2019-04-09 00:00:00 | 2 1 | 2019-04-09 00:00:30 | 7 (2 + 5) 1 | 2019-04-09 00:01:00 | 10 ( 7 + 3 ) 1 | 2019-04-09 00

avg sale of quarter with previous quarter avg sale

▼魔方 西西 提交于 2019-12-01 12:54:30
I have a table one in which there are various attribute like region product,year,qtr,month,sale. I have to calculate the avg_qtr sale of each product having same region and show their previous avg_qtr sale.I have read about lag but here it is not possible to use as it is not fixed after how many rows it will be repeated. My table structure is like this Region Product Year Qtr Month Sales NORTH P1 2015 1 JAN 1000 NORTH P1 2015 1 FEB 2000 NORTH P1 2015 1 MAR 3000 NORTH P1 2015 2 APR 4000 NORTH P1 2015 2 MAY 5000 NORTH P1 2015 2 JUN 6000 NORTH P1 2015 3 JUL 7000 NORTH P1 2015 3 AUG 8000 NORTH P1

how to calculate balances in an accounting software using postgres window function

[亡魂溺海] 提交于 2019-12-01 12:42:11
I'ved got a problem same as this but I am using Postgres. Calculate balance with mysql have a table which contains the following data: ID In Out 1 100.00 0.00 2 10.00 0.00 3 0.00 70.00 4 5.00 0.00 5 0.00 60.00 6 20.00 0.00 Now I need a query which gives me the following result: ID In Out Balance 1 100.00 0.00 100.00 2 10.00 0.00 110.00 3 0.00 70.00 40.00 4 5.00 0.00 45.00 5 0.00 60.00 -15.00 6 20.00 0.00 5.00 How best to handle "balance" calculation. I was told there is window function in postgres, how would this be done using postgres window functions ? Thanks. select t.*, sum("In"-"Out")

Jump SQL gap over specific condition & proper lead() usage

故事扮演 提交于 2019-12-01 12:18:57
(PostgreSQL 8.4) Continuing with my previous example , I wish to further my understanding of gaps-and-islands processing with Window-functions. Consider the following table and data: CREATE TABLE T1 ( id SERIAL PRIMARY KEY, val INT, -- some device status INT -- 0=OFF, 1=ON ); INSERT INTO T1 (val, status) VALUES (10, 0); INSERT INTO T1 (val, status) VALUES (11, 0); INSERT INTO T1 (val, status) VALUES (11, 1); INSERT INTO T1 (val, status) VALUES (10, 1); INSERT INTO T1 (val, status) VALUES (11, 0); INSERT INTO T1 (val, status) VALUES (10, 0); As previously explained, the devices turn ON and OFF