window-functions

Postgres windows function with aggregate group by

拥有回忆 提交于 2019-12-11 06:46:22
问题 I want to get a list of email domains and the top user within each domain. My approach is to sum the questions per email grouped by the domain and then get the top user with a window function. However this does not work: SELECT domain, sum(questions_per_email) as questions_per_domain, first_value(email) OVER (PARTITION BY domain ORDER BY questions_per_email DESC) as top_user FROM ( SELECT email, lower(substring(u.email from position('@' in u.email)+1)) as domain, count(*) as questions_per

Aggregate/Window functions restriction in Postgres Row Level Security Policy conditions

扶醉桌前 提交于 2019-12-11 05:35:25
问题 I've been successfully able to use a dense_rank() over (order by...) which AFAIK is a window function - in postgres' row level security policy conditions. However, the documentation states Any SQL conditional expression (returning boolean). The conditional expression cannot contain any aggregate or window functions (emphasis is mine). Can someone explain this restriction and give an example where it applies? Thanks. 回答1: Basically, it tells you that each row is independent in regard of row

PostgreSQL last_value ignore nulls

女生的网名这么多〃 提交于 2019-12-11 05:18:08
问题 I know this already been asked, but why doesn't the solution below work? I want to fill value with the last non-null value ordered by idx . What I see: idx | coalesce -----+---------- 1 | 2 2 | 4 3 | 4 | 5 | 10 (5 rows) What I want: idx | coalesce -----+---------- 1 | 2 2 | 4 3 | 4 4 | 4 5 | 10 (5 rows) Code: with base as ( select 1 as idx , 2 as value union select 2 as idx , 4 as value union select 3 as idx , null as value union select 4 as idx , null as value union select 5 as idx , 10 as

Rolling (moving) median in Greenplum

早过忘川 提交于 2019-12-11 04:59:51
问题 I would like to calculate the rolling median for a column in Greenplum, i.e. as below: | x | rolling_median_x | | -- + ---------------- | | 4 | 4 | | 1 | 2.5 | | 3 | 3 | | 2 | 2.5 | | 1 | 2 | | 6 | 2.5 | | 9 | 3 | x is an integer and for each row rolling_median_x shows the median of x for the current and preceding rows. E.g. for the third row rolling_median_x = median(4, 1, 3) = 3 . Things I've found out so far: the median function can't be used as a framed window function, i.e. median(x)

Count rows in partition with Order By

你离开我真会死。 提交于 2019-12-11 03:33:48
问题 I was trying to understand PARTITION BY in postgres by writing a few sample queries. I have a test table on which I run my query. id integer | num integer ___________|_____________ 1 | 4 2 | 4 3 | 5 4 | 6 When I run the following query, I get the output as I expected. SELECT id, COUNT(id) OVER(PARTITION BY num) from test; id | count ___________|_____________ 1 | 2 2 | 2 3 | 1 4 | 1 But, when I add ORDER BY to the partition, SELECT id, COUNT(id) OVER(PARTITION BY num ORDER BY id) from test; id

Can multiple rows within a window be referenced by an analytic function?

大憨熊 提交于 2019-12-11 02:39:19
问题 Given a table with: ID VALUE -- ----- 1 1 2 2 3 3 4 4 I would like to compute something like this: ID VALUE SUM -- ----- --- 1 1 40 -- (2-1)*2 + (3-1)*3 + (4-1)*4 + (5-1)*5 2 2 26 -- (3-2)*3 + (4-2)*4 + (5-2)*5 3 3 14 -- (4-3)*4 + (5-3)*5 4 4 5 -- (5-4)*5 5 5 0 -- 0 Where the SUM on each row is the sum of the values of each subsequent row multiplied by the difference between the value of the subsequent row and the current row. I could start with something like this: CREATE TABLE x(id int,

How can I get the Redshift/Postgresql LAG window function to selectively exclude records?

為{幸葍}努か 提交于 2019-12-11 02:32:49
问题 I have this table in Redshift, and I'm trying to write a query for the following dataset. For those items such as row#3 which are 'renewal successes' and are preceded by a 'sub success', I want to flag them as 'is_first_renewal = true', BUT they might have been preceded by any number of 'RENEWAL Failures' before they succeeded, so I can't use the window function LAG for this scenario. I also cannot filter out FAILURES as my query needs those. id phone op ts pr status result is_first_renewal 1

Group and count events per time intervals, plus running total

家住魔仙堡 提交于 2019-12-11 02:23:39
问题 I'm a fairly new Postgres user, I'm sure there's an answer to this already but I can't find it. I need to analyze some data in an activity log table, grouping the the results by time period. A simple version of the problem would be a table with three fields: Column | Type | Modifiers --------------+--------------------------+------------------------------------- period_start | timestamp with time zone | not null user_id | text | not null action | text | not null The action string I want to

Selecting sum and running balance for last 18 months with generate_series

会有一股神秘感。 提交于 2019-12-11 00:59:59
问题 I have this working query, but I need to add all months to my result, no matter if the items sold during that month: select * from ( select to_char(max(change_date), 'YYYY-MON')::varchar(8) as yyyymmm, max(change_date) as yearmonth, sum(vic.sold_qty / item_size.qty)::numeric(18,2) as sold_qty, -- sold monthly sum(sum(on_hand)) OVER (PARTITION BY vic.item_id order by year,month) as on_hand --running balance from (((view_item_change vic left join item on vic.item_id = item.item_id) left join

Spark: “Cannot use an UnspecifiedFrame. This should have been converted during analysis. Please file a bug report”

余生长醉 提交于 2019-12-10 19:57:57
问题 Spark 2.3.0 with Scala 2.11. I am trying to write a custom aggregator and run it over a window function per these docs but am getting the error in the title. Here is a stripped-down example. This is written as a FunSuite test. I know the error message says to file a bug report, but this is such a simple example lifted almost directly from the documentation that I wonder if there is something in my code that is causing the error. I wonder if using a collection type as the buffer is somehow not