window-functions

How to find median by attribute with Postgres window functions?

非 Y 不嫁゛ 提交于 2019-12-20 07:27:14
问题 I use PostgreSQL and have records like this on groups of people: name | people | indicator --------+--------+----------- group 1 | 1000 | 1 group 2 | 100 | 2 group 3 | 2000 | 3 I need to find the indicator for the median person . The result should be group 3 | 2000 | 3 If I do select median(name) over (order by indicator) from table1 It will be group 2 . Not sure if I can select this with a window function. Generating 1000/2000 rows per record seems impractical, because I have millions of

Window functions in SQLite3

牧云@^-^@ 提交于 2019-12-20 04:54:06
问题 The following Oracle SQL select allows me to select all the rows of a table that are duplicated according to some fields, eg, they have the same COLUMN_1 , COLUMN_2 and COLUMN_3 SELECT * FROM ( SELECT t.*, ROW_NUMBER() OVER (PARTITION BY COLUMN_1, COLUMN_2, COLUMN_3 ORDER BY COLUMN_1) AS rn FROM MY_TABLE t ) WHERE rn > 1; How to do the very same in sqlite3? 回答1: You can use rowid and a correlated subquery: select t.* from (select t.*, (select count(*) from my_table t2 where t2.column_1 = t

Select partitions based on matches in other table

左心房为你撑大大i 提交于 2019-12-20 03:35:10
问题 Having the following table ( conversations ): id | record_id | is_response | text | ---+------------+---------------+----------------------+ 1 | 1 | false | in text 1 | 2 | 1 | true | response text 3 | 3 | 1 | false | in text 2 | 4 | 1 | true | response text 2 | 5 | 1 | true | response text 3 | 6 | 2 | false | in text 1 | 7 | 2 | true | response text 1 | 8 | 2 | false | in text 2 | 9 | 2 | true | response text 3 | 10 | 2 | true | response text 4 | And another help table ( responses ): id |

Grouping based on sequence of rows

断了今生、忘了曾经 提交于 2019-12-20 03:27:27
问题 I have a table of orders with a column denoting whether it's a buy or a sell, with the rows typically ordered by timestamp. What I'd like to do is operate on groups of consecutive buys, plus their sell. e.g. B B S B S B B S -> (B B S) (B S) (B B S) Example: order_action | timestamp -------------+--------------------- buy | 2013-10-03 13:03:02 buy | 2013-10-08 13:03:02 sell | 2013-10-10 15:58:02 buy | 2013-11-01 09:30:02 buy | 2013-11-01 14:03:02 sell | 2013-11-07 10:34:02 buy | 2013-12-03 15

How to group timestamps into islands (based on arbitrary gap)?

可紊 提交于 2019-12-20 03:20:26
问题 Consider this list of dates as timestamptz : I grouped the dates by hand using colors: every group is separated from the next by a gap of at least 2 minutes. I'm trying to measure how much a given user studied, by looking at when they performed an action (the data is when they finished studying a sentence.) e.g.: on the yellow block, I'd consider the user studied in one sitting, from 14:24 till 14:27, or roughly 3 minutes in a row. I see how I could group these dates with a programming

Sessionize a column of numbers into groups of 30 once a threshold is met in Teradata

北战南征 提交于 2019-12-20 02:42:16
问题 Consider a column that represents "time between events": (5, 40, 3, 6, 0, 9, 0, 4, 5, 18, 2, 4, 3, 2) I would like to group these into buckets of 30, but buckets that reset. Desired outcome: (0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2) This is because, when we get to a cumulative 30, we "reset" and begin counting again. So, 5 + 40 > 30, we drop down to zero and begin cumulative adding until we reach 30...(3 + 6 + 0 ...), which happens at when we reach 10th element == 18. This can be implemented

lag to get first non null value since the previous null value

你。 提交于 2019-12-20 02:17:11
问题 Below is an example of what I'm trying to achieve in a Redshift Database. I have a variable current_value and I want to create a new column value_desired that is: the same as current_value if the previous row is null equal to the last preceding non-null value if the previous row is non-null It sounds like an easy task but I haven't found a way to do it yet. row_numb current_value value_desired 1 2 3 47 47 4 5 45 45 6 7 8 42 42 9 41 42 10 40 42 11 39 42 12 38 42 13 14 36 36 15 16 17 33 33 18

What's the default window frame for window functions

拜拜、爱过 提交于 2019-12-19 20:26:13
问题 Running the following code: val sales = Seq( (0, 0, 0, 5), (1, 0, 1, 3), (2, 0, 2, 1), (3, 1, 0, 2), (4, 2, 0, 8), (5, 2, 2, 8)) .toDF("id", "orderID", "prodID", "orderQty") val orderedByID = Window.orderBy('id) val totalQty = sum('orderQty).over(orderedByID).as('running_total) val salesTotalQty = sales.select('*, totalQty).orderBy('id) salesTotalQty.show The result is: +---+-------+------+--------+-------------+ | id|orderID|prodID|orderQty|running_total| +---+-------+------+--------+-------

What's the default window frame for window functions

丶灬走出姿态 提交于 2019-12-19 20:26:10
问题 Running the following code: val sales = Seq( (0, 0, 0, 5), (1, 0, 1, 3), (2, 0, 2, 1), (3, 1, 0, 2), (4, 2, 0, 8), (5, 2, 2, 8)) .toDF("id", "orderID", "prodID", "orderQty") val orderedByID = Window.orderBy('id) val totalQty = sum('orderQty).over(orderedByID).as('running_total) val salesTotalQty = sales.select('*, totalQty).orderBy('id) salesTotalQty.show The result is: +---+-------+------+--------+-------------+ | id|orderID|prodID|orderQty|running_total| +---+-------+------+--------+-------

avg sale of quarter with previous quarter avg sale

这一生的挚爱 提交于 2019-12-19 11:42:41
问题 I have a table one in which there are various attribute like region product,year,qtr,month,sale. I have to calculate the avg_qtr sale of each product having same region and show their previous avg_qtr sale.I have read about lag but here it is not possible to use as it is not fixed after how many rows it will be repeated. My table structure is like this Region Product Year Qtr Month Sales NORTH P1 2015 1 JAN 1000 NORTH P1 2015 1 FEB 2000 NORTH P1 2015 1 MAR 3000 NORTH P1 2015 2 APR 4000 NORTH