window-functions

Select latest timestamp record after a window operation for every group in the data with Spark Scala

懵懂的女人 提交于 2019-12-08 11:33:22
问题 I ran a count of attempts by (user,app) over a time window of day(86400). I want to extract the rows with latest timestamp with the count and remove unnecessary previous counts. Make sure your answer considers the time window. One user with 1 device can do make multiple attempts a day or a week, I wanna be able to retrieve those particular moments with the final count in every specific window. My intial dataset is like this: val df = sc.parallelize(Seq( ("user1", "iphone", "2017-12-22 10:06

How to use a window function to determine when to perform different tasks in Hive or Postgres?

百般思念 提交于 2019-12-08 08:38:31
I am new to SQL and need to be able to solve the following problem in both Hive and Postgres. Data I have a some data showing the start day and end day for different pre-prioritised tasks per person: person task_key start_day end_day 1 Kate A 1 5 2 Kate B 1 5 3 Adam A 1 5 4 Adam B 2 5 5 Eve A 2 5 6 Eve B 1 5 7 Jason A 1 5 8 Jason B 4 5 9 Jason C 3 5 10 Jason D 5 5 11 Jason E 4 5 NOTE: Task key is ordered so that higher letters have higher priorities. Question I need to work out which task each person should be working on each day, with the condition that: Higher lettered tasks take priority

Semantic exception error in HIVE while using last_value window function

末鹿安然 提交于 2019-12-08 06:17:10
问题 I have a table with the following data: dt device id count 2018-10-05 computer 7541185957382 6 2018-10-20 computer 7541185957382 3 2018-10-14 computer 7553187775734 6 2018-10-17 computer 7553187775734 10 2018-10-21 computer 7553187775734 2 2018-10-22 computer 7549187067178 5 2018-10-20 computer 7553187757256 3 2018-10-11 computer 7549187067178 10 I want to get the last and first dt for each id . Hence, I used the window functions first_value and last_value as follows: select id,last_value(dt)

“Cumulative difference” function in R

天涯浪子 提交于 2019-12-08 06:11:42
问题 Is there a pre-existing function to calculate the cumulative difference between consequtive values? Context: this is to estimate the change in altitude that a person has to undergo in both directions on a journey generated by CycleStreet.net. Reproducible example: x <- c(27, 24, 24, 27, 28) # create the data Method 1: for loop for(i in 2:length(x)){ # for loop way if(i == 2) cum_change <- 0 cum_change <- Mod(x[i] - x[i - 1]) + cum_change cum_change } ## 7 Method 2: vectorised diffs <- Mod(x[

Rank based on sequence of dates

梦想与她 提交于 2019-12-08 02:58:11
问题 I am having data as below **Heading Date** A 2009-02-01 B 2009-02-03 c 2009-02-05 d 2009-02-06 e 2009-02-08 I need rank as below Heading Date Rank A 2009-02-01 1 B 2009-02-03 2 c 2009-02-05 1 d 2009-02-06 2 e 2009-02-07 3 As I need rank based on date. If the date is continuous the rank should be 1, 2, 3 etc. If there is any break on dates I need to start over with 1, 2, ... Can any one help me on this? 回答1: SELECT heading, thedate ,row_number() OVER (PARTITION BY grp ORDER BY thedate) AS rn

Optimizing SUM OVER PARTITION BY for several hierarchical groups

旧街凉风 提交于 2019-12-07 21:12:24
问题 I have a table like below: Region Country Manufacturer Brand Period Spend R1 C1 M1 B1 2016 5 R1 C1 M1 B1 2017 10 R1 C1 M1 B1 2017 20 R1 C1 M1 B2 2016 15 R1 C1 M1 B3 2017 20 R1 C2 M1 B1 2017 5 R1 C2 M2 B4 2017 25 R1 C2 M2 B5 2017 30 R2 C3 M1 B1 2017 35 R2 C3 M2 B4 2017 40 R2 C3 M2 B5 2017 45 I need to find SUM([Spend] over different groups as follow: Total Spend over all the rows in the whole table Total Spend for each Region Total Spend for each Region and Country group Total Spend for each

Selecting every Nth row per user in Postgres

此生再无相见时 提交于 2019-12-07 20:17:54
问题 I was using this SQL statement: SELECT "dateId", "userId", "Salary" FROM ( SELECT *, (row_number() OVER (ORDER BY "userId", "dateId"))%2 AS rn FROM user_table ) sa WHERE sa.rn=1 AND "userId" = 789 AND "Salary" > 0; But every time the table gets new rows the result of the query is different. Am I missing something? 回答1: Assuming that ("dateId", "userId") is unique and new rows always have a bigger (later) dateId . After some comments: What I think you need: SELECT "dateId", "userId", "Salary"

PostgreSQL IGNORE NULLS in window functions

梦想的初衷 提交于 2019-12-07 13:48:26
问题 On the left panel data without IGNORE NULLS. On the right panel data with IGNORE NULLS. So I need to get right variant in PostgreSQL Need to emulate Oracle IGNORE NULLS in window functions (LEAD and LAG) in PostgreSQL. SELECT empno, ename, orig_salary, LAG(orig_salary, 1, 0) IGNORE NULLS OVER (ORDER BY orig_salary) AS sal_prev FROM tbl_lead; If there are NULL, it should return the latest not null value. I've tried it via PostgreSQL user defined aggregate functions, but it's rather hard to

How to divide the value of current row with the following one?

≡放荡痞女 提交于 2019-12-07 13:46:15
问题 In Spark-Sql version 1.6, using DataFrame s, is there a way to calculate, for a specific column, the fraction of dividing current row and the next one, for every row? For example, if I have a table with one column, like so Age 100 50 20 4 I'd like the following output Franction 2 2.5 5 The last row is dropped because it has no "next row" to be added to. Right now I am doing it by ranking the table and joining it with itself, where the rank is equals to rank+1 . Is there a better way to do

Can I use window functions in doctrine 2?

心不动则不痛 提交于 2019-12-07 06:56:31
问题 SELECT invoice.id, COUNT(slip.id), SUM(projected_minutes) OVER (PARTITION BY task.id) AS projected_minutes FROM invoice INNER JOIN task ON task.invoice_id = invoice.id LEFT JOIN slip ON slip.task_id = task.id The query above is in postgresql, and I want to convert it to DQL, but I cant find any documentation for window functions in DQL, is this natively supported in doctrine or would i have to create a custom dql function for this? 回答1: There is no support for this vendor specific function in