window-functions

Top n distinct values of one column in Oracle

烂漫一生 提交于 2020-01-11 10:37:42
问题 I'm using a query where a part of it gets the top 3 of a certain column. It creates a distinct subquery of the column, limited by 3 number of rows, and then filters those rows to the main query to do the top 3. WITH subquery AS ( SELECT col FROM ( SELECT DISTINCT col FROM tbl ) WHERE ROWNUM <= 3 ) SELECT col FROM tbl WHERE tbl.col = subquery.col So the original table is like this: col ----- a a a b b b c d d e f f f f And the query returns the top 3 of the column (not the top 3 rows which

Select random row for each group

岁酱吖の 提交于 2020-01-09 10:25:07
问题 I have a table like this ID ATTRIBUTE 1 A 1 A 1 B 1 C 2 B 2 C 2 C 3 A 3 B 3 C I'd like to select just one random attribute for each ID. The result therefore could look like this (although this is just one of many options ATTRIBUTE B C C This is my attempt on this problem SELECT "ATTRIBUTE" FROM ( SELECT "ID", "ATTRIBUTE", row_number() OVER (PARTITION BY "ID" ORDER BY random()) rownum FROM table ) shuffled WHERE rownum = 1 however, I don't know if this is a good solution, as I need to

Avoid performance impact of a single partition mode in Spark window functions

血红的双手。 提交于 2020-01-08 17:42:07
问题 My question is triggered by the use case of calculating the differences between consecutive rows in a spark dataframe. For example, I have: >>> df.show() +-----+----------+ |index| col1| +-----+----------+ | 0.0|0.58734024| | 1.0|0.67304325| | 2.0|0.85154736| | 3.0| 0.5449719| +-----+----------+ If I choose to calculate these using "Window" functions, then I can do that like so: >>> winSpec = Window.partitionBy(df.index >= 0).orderBy(df.index.asc()) >>> import pyspark.sql.functions as f >>>

SQL views, grouping by most sold items and customers who purchased most

爷,独闯天下 提交于 2020-01-06 04:52:28
问题 This is my table: Using this query, I am getting most sold items: SELECT [Purchased Item], SUM([Overall Quantity purchased] ) FROM ReportDraft GROUP BY [Purchased Item] ORDER BY SUM([Overall Quantity purchased] ) This returns items and total quantity purchased by customer. Can I somehow create a table like ItemName | Total quantity purchased | Customer who purchased most | Customer quantity bought Pie--------|---------11------------|---------------ALEX----------|--------3------------| Thank

Aggregation function to get the difference or ratio of two rows in order

和自甴很熟 提交于 2020-01-06 03:40:08
问题 I have a table full of prices, items, and dates. An example of this is: AA, 1/2/3024, 1.22 AA, 1/3/3024, 1.23 BB, 1/2/3024, 4.22 BB, 1/3/3024, 4.23 Within the data there are only two rows per price, and they are ordered by date. How would I condense this data set into a single product row showing the difference from the last price to the previous? [Also this applies to a ratio, so AA would produce 1.23/1.22]. The result should look like AA, todays price-yesterdays price Despite being a sum

Finding if current row is last row to be selected from database

两盒软妹~` 提交于 2020-01-05 12:46:46
问题 I am selecting list of periods from database. If current row is first row then the period starts with date and I can find the interval between period start like this: SELECT ... CASE WHEN row_number() OVER(ORDER BY r.created_at ASC) = 1 THEN r.created_at - r.created_at::date ELSE NULL END AS period ... FROM mytable r How can I do the same to last row? To find the time between the r.created_at of last row and midnight of its date. I am aware of first and last functions in PostgreSQL (https:/

Alternative to SQL window functions in Sybase

纵然是瞬间 提交于 2020-01-05 09:09:20
问题 I am working on Sybase Adaptive Server Enterprise (version 12.5.0.3). Trying to use Row_number() OVER (Partition by columnname order by columnname) . When I execute the query it is throwing an exception saying that the syntax near OVER is incorrect. I have searched for proper row_number() syntax for sybase database, but there is nothing wrong in the syntax. I guess that the Sybase version that am using does not support row_number() OVER . I even tried dense_rank() OVER , but am getting the

Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

北慕城南 提交于 2020-01-03 14:15:05
问题 I have sql query below but i face a problem when execute it. SELECT * from (Select row_number() OVER(Order By FloorUserId) as 'row_number', FloorUserId, max(CASE WHEN AreaId='[G]' or AreaId=N'L01' THEN 'X' ELSE ' ' END) as 'L01', max(CASE WHEN AreaId='[G]' or AreaId=N'L02' THEN 'X' ELSE ' ' END) as 'L02' from floor, tbuser where FloorUserId= tbuser.userID ) as derivedTable where row_number BETWEEN 1 AND 20 But I keep getting the following error: Column 'FloorId' is invalid in the select list

running total using windows function in sql has same result for same data

时光总嘲笑我的痴心妄想 提交于 2020-01-02 08:32:35
问题 From every references that I search how to do cumulative sum / running total. they said it's better using windows function, so I did select grandtotal,sum(grandtotal)over(order by agentname) from call but I realize that the results are okay as long as the value of each rows are different. Here is the result : Is There anyway to fix this? 回答1: You might want to review the documentation on window specifications (which is here). The default is "range between" which defines the range by the

Using multiple columns in dplyr window functions?

我的未来我决定 提交于 2020-01-02 05:37:05
问题 Comming from SQL i would expect i was able to do something like the following in dplyr, is this possible? # R tbl %>% mutate(n = dense_rank(Name, Email)) -- SQL SELECT Name, Email, DENSE_RANK() OVER (ORDER BY Name, Email) AS n FROM tbl Also is there an equivilant for PARTITION BY ? 回答1: I did struggle with this problem and here is my solution: In case you can't find any function which supports ordering by multiple variables, I suggest that you concatenate them by their priority level from