window-functions

Why no windowed functions in where clauses?

淺唱寂寞╮ 提交于 2019-11-26 05:32:43
问题 Title says it all, why can\'t I use a windowed function in a where clause in SQL Server? This query makes perfect sense: select id, sales_person_id, product_type, product_id, sale_amount from Sales_Log where 1 = row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc) But it doesn\'t work. Is there a better way than a CTE/Subquery? EDIT For what its worth this is the query with a CTE: with Best_Sales as ( select id, sales_person_id, product_type,

Spark Window Functions - rangeBetween dates

我们两清 提交于 2019-11-26 03:56:26
问题 I am having a Spark SQL DataFrame with data and what I\'m trying to get is all the rows preceding current row in a given date range. So for example I want to have all the rows from 7 days back preceding given row. I figured out I need to use a Window Function like: Window \\ .partitionBy(\'id\') \\ .orderBy(\'start\') and here comes the problem. I want to have a rangeBetween 7 days, but there is nothing in the Spark docs I could find on this. Does Spark even provide such option? For now I\'m

PostgreSQL: running count of rows for a query 'by minute'

人走茶凉 提交于 2019-11-26 03:16:40
问题 I need to query for each minute the total count of rows up to that minute. The best I could achieve so far doesn\'t do the trick. It returns count per minute, not the total count up to each minute: SELECT COUNT(id) AS count , EXTRACT(hour from \"when\") AS hour , EXTRACT(minute from \"when\") AS minute FROM mytable GROUP BY hour, minute 回答1: Only return minutes with activity Shortest SELECT DISTINCT date_trunc('minute', "when") AS minute , count(*) OVER (ORDER BY date_trunc('minute', "when"))

Pandas get topmost n records within each group

≡放荡痞女 提交于 2019-11-26 00:53:43
问题 Suppose I have pandas DataFrame like this: >>> df = pd.DataFrame({\'id\':[1,1,1,2,2,2,2,3,4],\'value\':[1,2,3,1,2,3,4,1,1]}) >>> df id value 0 1 1 1 1 2 2 1 3 3 2 1 4 2 2 5 2 3 6 2 4 7 3 1 8 4 1 I want to get a new DataFrame with top 2 records for each id, like this: id value 0 1 1 1 1 2 3 2 1 4 2 2 7 3 1 8 4 1 I can do it with numbering records within group after group by: >>> dfN = df.groupby(\'id\').apply(lambda x:x[\'value\'].reset_index()).reset_index() >>> dfN id level_1 index value 0 1

Best way to get result count before LIMIT was applied

好久不见. 提交于 2019-11-25 23:42:11
问题 When paging through data that comes from a DB, you need to know how many pages there will be to render the page jump controls. Currently I do that by running the query twice, once wrapped in a count() to determine the total results, and a second time with a limit applied to get back just the results I need for the current page. This seems inefficient. Is there a better way to determine how many results would have been returned before LIMIT was applied? I am using PHP and Postgres. 回答1: Pure

Dynamic alternative to pivot with CASE and GROUP BY

空扰寡人 提交于 2019-11-25 23:07:25
问题 I have a table that looks like this: id feh bar 1 10 A 2 20 A 3 3 B 4 4 B 5 5 C 6 6 D 7 7 D 8 8 D And I want it to look like this: bar val1 val2 val3 A 10 20 B 3 4 C 5 D 6 7 8 I have this query that does this: SELECT bar, MAX(CASE WHEN abc.\"row\" = 1 THEN feh ELSE NULL END) AS \"val1\", MAX(CASE WHEN abc.\"row\" = 2 THEN feh ELSE NULL END) AS \"val2\", MAX(CASE WHEN abc.\"row\" = 3 THEN feh ELSE NULL END) AS \"val3\" FROM ( SELECT bar, feh, row_number() OVER (partition by bar) as row FROM \

PostgreSQL unnest() with element number

徘徊边缘 提交于 2019-11-25 21:52:33
问题 When I have a column with separated values, I can use the unnest() function: myTable id | elements ---+------------ 1 |ab,cd,efg,hi 2 |jk,lm,no,pq 3 |rstuv,wxyz select id, unnest(string_to_array(elements, \',\')) AS elem from myTable id | elem ---+----- 1 | ab 1 | cd 1 | efg 1 | hi 2 | jk ... How can I include element numbers? I.e.: id | elem | nr ---+------+--- 1 | ab | 1 1 | cd | 2 1 | efg | 3 1 | hi | 4 2 | jk | 1 ... I want the original position of each element in the source string. I\'ve