window-functions

SQL window function with a where clause?

流过昼夜 提交于 2019-12-05 01:54:32
I'm trying to correlate two types of events for users. I want to see all event "B"s along with the most recent event "A" for that user prior to the "A" event. How would one accomplish this? In particular, I'm trying to do this in Postgres. I was hoping it was possible to use a "where" clause in a window function, in which case I could essentially do a LAG() with a "where event='A'", but that doesn't seem to be possible. Any recommendations? Data example: |user |time|event| |-----|----|-----| |Alice|1 |A | |Bob |2 |A | |Alice|3 |A | |Alice|4 |B | |Bob |5 |B | |Alice|6 |B | Desired result: |user

Select a row of first non-null values in a sparse table

烂漫一生 提交于 2019-12-05 01:13:59
Using the following table: A | B | C | ts --+------+------+------------------ 1 | null | null | 2016-06-15 10:00 4 | null | null | 2016-06-15 11:00 4 | 9 | null | 2016-06-15 12:00 5 | 1 | 7 | 2016-06-15 13:00 How do I select the first non-null value of each column in a running window of N rows? "First" as defined by the order of timestamps in columns ts . Querying the above table would result in: A | B | C --+---+--- 1 | 9 | 7 The window function first_value() allows for a rather short and elegant solution: SELECT first_value(a) OVER (ORDER BY a IS NULL, ts) AS a , first_value(b) OVER (ORDER

Filtering out duplicate subsequent records in a SELECT

时间秒杀一切 提交于 2019-12-04 22:47:28
问题 (PostgreSQL 8.4) Table "trackingMessages" stores tracking events between mobile devices (tm_nl_mobileid) and fixed devices (tm_nl_fixedId). CREATE TABLE trackingMessages ( tm_id SERIAL PRIMARY KEY, -- PK tm_nl_mobileId INTEGER, -- FK to mobile tm_nl_fixedId INTEGER, -- FK to fixed tm_date INTEGER, -- Network time tm_messageType INTEGER, -- 0=disconnect, 1=connect CONSTRAINT tm_unique_row UNIQUE (tm_nl_mobileId, tm_nl_fixedId, tm_date, tm_messageType) ); Problem here is that it's possible that

What is the difference between rowsBetween and rangeBetween?

情到浓时终转凉″ 提交于 2019-12-04 22:35:10
From the PySpark docs rangeBetween : rangeBetween(start, end) Defines the frame boundaries, from start (inclusive) to end (inclusive). Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one off before the current row, and “5” means the five off after the current row. Parameters: start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower). end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher). New in version 1.4. while rowsBetween rowsBetween(start, end) Defines the

Spark SQL Window over interval of between two specified time boundaries - between 3 hours and 2 hours ago

无人久伴 提交于 2019-12-04 19:11:36
What is the proper way of specifying window interval in Spark SQL, using two predefined boundaries? I am trying to sum up values from my table over a window of "between 3 hours ago and 2 hours ago". When I run this query: select *, sum(value) over ( partition by a, b order by cast(time_value as timestamp) range between interval 2 hours preceding and current row ) as sum_value from my_temp_table; That works. I get results that I expect, i.e. sums of values that fall into 2 hours rolling window. Now, what I need is to have that rolling window not being bound to the current row but to take into

Group by end of period instead of start date

坚强是说给别人听的谎言 提交于 2019-12-04 18:43:51
I'm looking to aggregate data by the end date of a dataset with some leading period rather than the start. For example, I want to query a table and return the count of matching results 30 days PRIOR to the end date of the date shown in the results. The original table would contain ONLY the date a sale was made (timestamp). Example: sales_timestamp ------------------ 2015-08-05 12:00:00 2015-08-06 13:00:00 2015-08-25 12:31:00 2015-08-26 01:02:00 2015-08-27 02:03:00 2015-08-29 04:23:00 2015-09-01 12:00:00 2015-09-02 12:00:00 2015-09-08 00:00:00 An example of the resulting query output would be:

SQL issue - calculate max days sequence

早过忘川 提交于 2019-12-04 13:46:01
问题 There is a table with visits data: uid (INT) | created_at (DATETIME) I want to find how many days in a row a user has visited our app. So for instance: SELECT DISTINCT DATE(created_at) AS d FROM visits WHERE uid = 123 will return: d ------------ 2012-04-28 2012-04-29 2012-04-30 2012-05-03 2012-05-04 There are 5 records and two intervals - 3 days (28 - 30 Apr) and 2 days (3 - 4 May). My question is how to find the maximum number of days that a user has visited the app in a row (3 days in the

Window function in MySQL queries

蓝咒 提交于 2019-12-04 12:44:36
Is there a way to use window functions in MySQL queries dynamically within a SELECT query itself? (I know for a fact that it is possible in PostgreSQL). For example, here is the equivalent query in PostgreSQL: SELECT c_server_ip, c_client_ip, sum(a_num_bytes_sent) OVER (PARTITION BY c_server_ip) FROM network_table; However, what would be the corresponding query in MySQL? Hope this might work: select A.c_server_ip, A.c_client_ip, B.mySum from network_table A, ( select c_server_ip, sum(a_num_bytes_sent) as mySum from network_table group by c_server_ip ) as B where A.c_server_ip=B.c_server_ip;

ROW_NUMBER Without ORDER BY

亡梦爱人 提交于 2019-12-04 09:59:10
问题 I've to add row number in my existing query so that I can track how much data has been added into Redis. If my query failed so I can start from that row no which is updated in other table. Query to get data start after 1000 row from table SELECT * FROM (SELECT *, ROW_NUMBER() OVER (Order by (select 1)) as rn ) as X where rn > 1000 Query is working fine. If any way that I can get the row no without using order by. What is select 1 here? Is the query optimized or I can do it by other ways.

spark sql window function lag

元气小坏坏 提交于 2019-12-04 09:10:24
问题 I am looking at the window slide function for a Spark DataFrame in Spark SQL, Scala. I have a dataframe with columns Col1,Col1,Col1,date. Col1 Col2 Col3 date volume new_col 201601 100.5 201602 120.6 100.5 201603 450.2 120.6 201604 200.7 450.2 201605 121.4 200.7` Now I want to add a new column with name(new_col) with one row slided down, as shown above. I tried below option to use the window function. val windSldBrdrxNrx_df = df.withColumn("Prev_brand_rx", lag("Prev_brand_rx",1)) Can anyone