window-functions

Spark-SQL Window functions on Dataframe - Finding first timestamp in a group

匆匆过客 提交于 2019-12-11 12:48:39
问题 I have below dataframe (say UserData). uid region timestamp a 1 1 a 1 2 a 1 3 a 1 4 a 2 5 a 2 6 a 2 7 a 3 8 a 4 9 a 4 10 a 4 11 a 4 12 a 1 13 a 1 14 a 3 15 a 3 16 a 5 17 a 5 18 a 5 19 a 5 20 This data is nothing but user (uid) travelling across different regions (region) at different time (timestamp). Presently, timestamp is shown as 'int' for simplicity. Note that above dataframe will not be necessarily in increasing order of timestamp. Also, there may be some rows in between from different

How to enumerate groups of partitions in my Postgres table with window functions?

て烟熏妆下的殇ゞ 提交于 2019-12-11 12:45:50
问题 Suppose I have a table like this: id | part | value ----+-------+------- 1 | 0 | 8 2 | 0 | 3 3 | 0 | 4 4 | 1 | 6 5 | 0 | 13 6 | 0 | 4 7 | 1 | 2 8 | 0 | 11 9 | 0 | 15 10 | 0 | 3 11 | 0 | 2 I would like to enumerate groups between rows that have part atribute 1. So I would like to get this: id | part | value | number ----+-------+----------------- 1 | 0 | 8 | 1 2 | 0 | 3 | 1 3 | 0 | 4 | 1 4 | 1 | 6 | 0 5 | 0 | 13 | 2 6 | 0 | 4 | 2 7 | 1 | 2 | 0 8 | 0 | 11 | 3 9 | 0 | 15 | 3 10 | 0 | 3 | 3 11 |

Decode maximum number in rows for sql

核能气质少年 提交于 2019-12-11 11:48:02
问题 I am using the #standardsql in bigquery and trying to code the maksimum ranking of each customer_id as 1 , and the rest of it are 0 This is the query result so far The query for ranking is this ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY booking_date Asc) as ranking What i need is to create another column like this where it decode the maximum ranking of each customerid as 1, and the number below it as 0 just like the below table Thanks 回答1: Based on your sample data, your ranking is

How to group following rows by not unique value

大兔子大兔子 提交于 2019-12-11 11:46:49
问题 I have data like this: table1 _____________ id way time 1 1 00:01 2 1 00:02 3 2 00:03 4 2 00:04 5 2 00:05 6 3 00:06 7 3 00:07 8 1 00:08 9 1 00:09 I would like to know in which time interval I was on which way: desired output _________________ id way from to 1 1 00:01 00:02 3 2 00:03 00:05 6 3 00:06 00:07 8 1 00:08 00:09 I tried to use a window function: SELECT DISTINCT first_value(id) OVER w AS id, first_value(way) OVER w as way, first_value(time) OVER w as from, last_value(time) OVER w as to

Sum across partitions with window functions

二次信任 提交于 2019-12-11 11:12:22
问题 I have the following problem... Time | A | B | C -- Sum should be 1 a1 b1 c1 a1 + b1 + c1 2 a2 b2 x a2 + b1 + c1 3 a3 x x a3 + b2 + c1 4 x b3 c2 a3 + b3 + c2 Essentially, the sum needs to be across the most recent value in time for each of the three rows. Each data column doesn't necessarily have a value for the current time. I have tried several approaches using window functions and have been unsuccessful. I have written a stored procedure that does what I need, but it is SLOW. CREATE OR

select case with “over partition by”

柔情痞子 提交于 2019-12-11 10:43:20
问题 What's the correct syntax or is it possible to use case in a select and in it partition by? (using sql server 2012) a = unique id b = a string'xf%' c = values d = values e = values select case when b like 'xf%' then (sum(c*e)/100*3423 over (partition by a))end as sumProduct from #myTable this is something i need to solve which is a part of a problem i had previously sumProduct in sql edit : upon request adding some sample data and expected result create table #testing (b varchar (20), a date,

Sum of time difference between rows

北城余情 提交于 2019-12-11 10:23:15
问题 I have a table which records every status change of an entity id recordTime Status ID1 2014-03-01 11:33:00 Disconnected ID1 2014-03-01 12:13:00 Connected ID2 2014-03-01 12:21:00 Connected ID1 2014-03-01 12:24:00 Disconnected ID1 2014-03-01 12:29:00 Connected ID2 2014-03-01 12:40:00 Disconnected ID2 2014-03-01 13:03:00 Connected ID2 2014-03-01 13:13:00 Disconnected ID2 2014-03-01 13:29:00 Connected ID1 2014-03-01 13:30:00 Disconnected I need to calculate the total inactive time i.e time

how to rank over partition in MySql

女生的网名这么多〃 提交于 2019-12-11 09:09:43
问题 Im new use MySql database, I face the problem that I can solve it if in SQL server Database, but I cant do it in mysql this bellow my case MyTable: Name Price abs 100 abs 200 abs 60 trx 19 trx 20 abs 10 qwe 25 qwe 50 qwe 10 qwe 10 Result Expected: Name Price Rank abs 200 4 abs 100 3 abs 60 2 abs 10 1 qwe 50 4 qwe 25 3 qwe 10 2 qwe 10 1 trx 20 2 trx 19 1 could anyone help me how to make query like index result pict with Mysql 回答1: Using variable you can find Rank . Like this: SELECT Name,

Delete all rows but one with the greatest value per group

和自甴很熟 提交于 2019-12-11 08:24:03
问题 So, I just recently asked a question: Update using a subquery with aggregates and groupby in Postgres and it turns out I was going about my issue with flawed logic. In the same scenario in the question above, instead of updating all the rows to have the max quantity, I'd like to delete the rows that don't have the max quantity (and any duplicate max quantities). Essentially I need to just convert the below to a delete statement that preserves only the largest quantities per item_name. I'm

How I can get Second max salary using “over(partition by)” in oracle SQL?

孤街浪徒 提交于 2019-12-11 07:43:36
问题 I already get it by doing this query: SELECT * FROM ( SELECT emp_id,salary,row_number() over(order by salary desc) AS rk FROM test_qaium ) where rk=2; But one of my friend ask me to find second MAX salary from employees table must using " over(partition by ) " in oracle sql. Anybody please help me. And clear me the concept of " Partition by " in oracle sql. 回答1: Oracle Setup : CREATE TABLE test_qaium ( emp_id, salary, department_id ) AS SELECT 1, 10000, 1 FROM DUAL UNION ALL SELECT 2, 20000,