window-functions

Islands and Gaps Issue

安稳与你 提交于 2019-12-22 08:44:43
问题 Backstory: I have a database that has data points of drivers in trucks which also contain the. While in a truck, the driver can have a 'driverstatus'. What I'd like to do is group these statuses by driver, truck. As of now, I've tried using LAG/LEAD to help. The reason for this is so I can tell when a driverstatus change occurs, and then I can mark that row as having the last datetime of that status. That in itself is insufficient, because I need to group the statuses by their status and date

Group by end of period instead of start date

我怕爱的太早我们不能终老 提交于 2019-12-21 22:42:22
问题 I'm looking to aggregate data by the end date of a dataset with some leading period rather than the start. For example, I want to query a table and return the count of matching results 30 days PRIOR to the end date of the date shown in the results. The original table would contain ONLY the date a sale was made (timestamp). Example: sales_timestamp ------------------ 2015-08-05 12:00:00 2015-08-06 13:00:00 2015-08-25 12:31:00 2015-08-26 01:02:00 2015-08-27 02:03:00 2015-08-29 04:23:00 2015-09

Selecting positive aggregate value and ignoring negative in Postgres SQL

大城市里の小女人 提交于 2019-12-21 16:59:38
问题 I must apply a certain transformation fn(argument) . Here argument is equal to value , but not when it is negative. When you get a first negative value , then you "wait" until it sums up with consecutive values and this sum becomes positive. Then you do fn(argument) . See the table I want to get: value argument --------------------- 2 2 3 3 -10 0 4 0 3 0 10 7 1 1 I could have summed all values and apply fn to the sum, but fn can be different for different rows and it is essential to know the

How to filter data using window functions in spark

给你一囗甜甜゛ 提交于 2019-12-21 05:40:48
问题 I have the following data : rowid uid time code 1 1 5 a 2 1 6 b 3 1 7 c 4 2 8 a 5 2 9 c 6 2 9 c 7 2 10 c 8 2 11 a 9 2 12 c Now I wanted to filter the data in such a way that I can remove the rows 6 and 7 as for a particular uid i want to keep just one row with value 'c' in code So the expected data should be : rowid uid time code 1 1 5 a 2 1 6 b 3 1 7 c 4 2 8 a 5 2 9 c 8 2 11 a 9 2 12 c I'm using window function something like this : val window = Window.partitionBy("uid").orderBy("time") val

Aggregating connected sets of nodes / edges

断了今生、忘了曾经 提交于 2019-12-21 05:05:08
问题 I have a connected set of edges with unique nodes. They are connected using a parent node. Consider the following example code and illustration: CREATE TABLE network ( node integer PRIMARY KEY, parent integer REFERENCES network(node), length numeric NOT NULL ); CREATE INDEX ON network (parent); INSERT INTO network (node, parent, length) VALUES (1, NULL, 1.3), (2, 1, 1.2), (3, 2, 0.9), (4, 3, 1.4), (5, 4, 1.6), (6, 2, 1.5), (7, NULL, 1.0); Visually, two groups of edges can be identified. How

PostgreSQL window function: row_number() over (partition col order by col2)

风流意气都作罢 提交于 2019-12-20 20:20:08
问题 Following result set is derived from a sql query with a few joins and a union. The sql query already groups rows on Date and game. I need a column to describe the number of attempts at a game partitioned by date column. Username Game ID Date johndoe1 Game_1 100 7/22/14 1:52 AM johndoe1 Game_1 100 7/22/14 1:52 AM johndoe1 Game_1 100 7/22/14 1:52 AM johndoe1 Game_1 100 7/22/14 1:52 AM johndoe1 Game_1 121 7/22/14 1:56 AM johndoe1 Game_1 121 7/22/14 1:56 AM johndoe1 Game_1 121 7/22/14 1:56 AM

Aggregate values over a range of hours, every hour

爷,独闯天下 提交于 2019-12-20 18:22:01
问题 I have a PostgreSQL 9.1 database with a table containing a timestamp and a measuring value '2012-10-25 01:00' 2 '2012-10-25 02:00' 5 '2012-10-25 03:00' 12 '2012-10-25 04:00' 7 '2012-10-25 05:00' 1 ... ... I need to average the value over a range of 8 hours, every hour. In other words, I need the average of 1h-8h, 2h-9h, 3h-10h etc. I have no idea how to proceed for such a query. I have looked everywhere but have also no clue what functionalities to look for. The closes I find are hourly/daily

Spark SQL Row_number() PartitionBy Sort Desc

不羁岁月 提交于 2019-12-20 10:36:07
问题 I've successfully create a row_number() partitionBy by in Spark using Window, but would like to sort this by descending, instead of the default ascending. Here is my working code: from pyspark import HiveContext from pyspark.sql.types import * from pyspark.sql import Row, functions as F from pyspark.sql.window import Window data_cooccur.select("driver", "also_item", "unit_count", F.rowNumber().over(Window.partitionBy("driver").orderBy("unit_count")).alias("rowNum")).show() That gives me this

How to make a SUM without group by

白昼怎懂夜的黑 提交于 2019-12-20 09:41:43
问题 Here is my problem.. Actual Auction Ammanat id 7000 500 100 228,229 7000 100 100 228,229 7000 900 100 228,229 5000 0 0 230 I want result as given below Actual Auction Ammanat Remaining id 7000 500 100 5550 228,229 7000 100 100 5550 228,229 7000 900 100 5550 228,229 5000 0 0 5000 230 Here, Remaining is (sum(auction)-actual) . I am using PostgreSQL. But if anyone know solution in SQL Server, it will be OK. 回答1: You need a to use a window function - http://www.postgresql.org/docs/9.3/static

How to use a SQL window function to calculate a percentage of an aggregate

人盡茶涼 提交于 2019-12-20 09:19:41
问题 I need to calculate percentages of various dimensions in a table. I'd like to simplify things by using window functions to calculate the denominator, however I am having an issue because the numerator has to be an aggregate as well. As a simple example, take the following table: create temp table test (d1 text, d2 text, v numeric); insert into test values ('a','x',5), ('a','y',5), ('a','y',10), ('b','x',20); If I just want to calculate the share of each individual row out of d1, then