window-functions

What is ROWS UNBOUNDED PRECEDING used for in Teradata?

流过昼夜 提交于 2019-11-27 00:10:47
问题 I am just starting on Teradata and I have come across an Ordered Analytical Function called "Rows unbounded preceding" in Teradata. I tried several sites to learn about the function but all of them uses a complicated example explaining the same. Could you please provide me with a naive example so that I can get the basics clear. 回答1: It's the "frame" or "range" clause of window functions, which are part of the SQL standard and implemented in many databases, including Teradata. A simple

How to aggregate over rolling time window with groups in Spark

别来无恙 提交于 2019-11-26 22:59:14
问题 I have some data that I want to group by a certain column, then aggregate a series of fields based on a rolling time window from the group. Here is some example data: df = spark.createDataFrame([Row(date='2016-01-01', group_by='group1', get_avg=5, get_first=1), Row(date='2016-01-10', group_by='group1', get_avg=5, get_first=2), Row(date='2016-02-01', group_by='group2', get_avg=10, get_first=3), Row(date='2016-02-28', group_by='group2', get_avg=20, get_first=3), Row(date='2016-02-29', group_by=

SparkR window function

一世执手 提交于 2019-11-26 21:44:31
问题 I found from JIRA that 1.6 release of SparkR has implemented window functions including lag and rank , but over function is not implemented yet. How can I use window function like lag function without over in SparkR (not the SparkSQL way)? Can someone provide an example? 回答1: Spark 2.0.0+ SparkR provides DSL wrappers with over , window.partitionBy / partitionBy , window.orderBy / orderBy and rowsBetween / rangeBeteen functions. Spark <= 1.6 Unfortunately it is not possible in 1.6.0. While

Why no windowed functions in where clauses?

最后都变了- 提交于 2019-11-26 20:28:57
Title says it all, why can't I use a windowed function in a where clause in SQL Server? This query makes perfect sense: select id, sales_person_id, product_type, product_id, sale_amount from Sales_Log where 1 = row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc) But it doesn't work. Is there a better way than a CTE/Subquery? EDIT For what its worth this is the query with a CTE: with Best_Sales as ( select id, sales_person_id, product_type, product_id, sale_amount, row_number() over (partition by sales_person_id, product_type, product_id order by

Calculating Cumulative Sum in PostgreSQL

北慕城南 提交于 2019-11-26 19:50:03
I want to find the cumulative or running amount of field and insert it from staging to table. My staging structure is something like this: ea_month id amount ea_year circle_id April 92570 1000 2014 1 April 92571 3000 2014 2 April 92572 2000 2014 3 March 92573 3000 2014 1 March 92574 2500 2014 2 March 92575 3750 2014 3 February 92576 2000 2014 1 February 92577 2500 2014 2 February 92578 1450 2014 3 I want my target table to look something like this: ea_month id amount ea_year circle_id cum_amt February 92576 1000 2014 1 1000 March 92573 3000 2014 1 4000 April 92570 2000 2014 1 6000 February

What's the difference between RANK() and DENSE_RANK() functions in oracle?

流过昼夜 提交于 2019-11-26 19:24:20
What's the difference between RANK() and DENSE_RANK() functions? How to find out nth salary in the following emptbl table? DEPTNO EMPNAME SAL ------------------------------ 10 rrr 10000.00 11 nnn 20000.00 11 mmm 5000.00 12 kkk 30000.00 10 fff 40000.00 10 ddd 40000.00 10 bbb 50000.00 10 ccc 50000.00 If in the table data having nulls , what will happen if I want to find out nth salary? RANK gives you the ranking within your ordered partition. Ties are assigned the same rank, with the next ranking(s) skipped. So, if you have 3 items at rank 2, the next rank listed would be ranked 5. DENSE_RANK

Is it possible to use user defined aggregates (clr) with window functions (over)?

五迷三道 提交于 2019-11-26 18:34:22
问题 Is it possible to use user defined aggregates (clr) with window functions (over) ? Can't find the answer in the documentation: http://technet.microsoft.com/en-us/library/ms190678.aspx 回答1: You're right that it's tricky to find anything in the documentation. But searching the Connect website, I managed to find this gem: Today, you can use CLR aggregates with OVER clause and PARTITION BY just like regular aggregate functions. Once we have support for window functions... Which was a response

GROUP BY and aggregate sequential numeric values

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-26 17:48:24
Using PostgreSQL 9.0. Let's say I have a table containing the fields: company , profession and year . I want to return a result which contains unique companies and professions, but aggregates (into an array is fine) years based on numeric sequence: Example Table: +-----------------------------+ | company | profession | year | +---------+------------+------+ | Google | Programmer | 2000 | | Google | Sales | 2000 | | Google | Sales | 2001 | | Google | Sales | 2002 | | Google | Sales | 2004 | | Mozilla | Sales | 2002 | +-----------------------------+ I'm interested in a query which would output

Postgres window function and group by exception

▼魔方 西西 提交于 2019-11-26 17:46:43
I'm trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time. Here's the query I have so far: SELECT p.name, e.date, sum(sp.payout) OVER (ORDER BY e.date) - sum(s.buyin) OVER (ORDER BY e.date) AS "Profit/Loss" FROM result r JOIN game g ON r.game_id = g.game_id JOIN event e ON g.event_id = e.event_id JOIN structure s ON g.structure_id = s.structure_id JOIN structure_payout sp ON g.structure_id = sp.structure_id AND r.position = sp.position JOIN player p ON r.player_id = p.player_id WHERE p.player_id = 17 GROUP BY p

Spark Window Functions - rangeBetween dates

白昼怎懂夜的黑 提交于 2019-11-26 14:34:39
I am having a Spark SQL DataFrame with data and what I'm trying to get is all the rows preceding current row in a given date range. So for example I want to have all the rows from 7 days back preceding given row. I figured out I need to use a Window Function like: Window \ .partitionBy('id') \ .orderBy('start') and here comes the problem. I want to have a rangeBetween 7 days, but there is nothing in the Spark docs I could find on this. Does Spark even provide such option? For now I'm just getting all the preceding rows with: .rowsBetween(-sys.maxsize, 0) but would like to achieve something