window-functions | 易学教程

avg sale of quarter with previous quarter avg sale

阅读更多关于 avg sale of quarter with previous quarter avg sale

问题 I have a table one in which there are various attribute like region product,year,qtr,month,sale. I have to calculate the avg_qtr sale of each product having same region and show their previous avg_qtr sale.I have read about lag but here it is not possible to use as it is not fixed after how many rows it will be repeated. My table structure is like this Region Product Year Qtr Month Sales NORTH P1 2015 1 JAN 1000 NORTH P1 2015 1 FEB 2000 NORTH P1 2015 1 MAR 3000 NORTH P1 2015 2 APR 4000 NORTH

Invalid count and sum in cross tab query using PostgreSQL

阅读更多关于 Invalid count and sum in cross tab query using PostgreSQL

问题 I am using PostgreSQL 9.3 version database. I have a situation where I want to count the number of products sales and sum the amount of product and also want to show the cities in a column where the product have sale. Example Setup create table products ( name varchar(20), price integer, city varchar(20) ); insert into products values ('P1',1200,'London'), ('P1',100,'Melborun'), ('P1',1400,'Moscow'), ('P2',1560,'Munich'), ('P2',2300,'Shunghai'), ('P2',3000,'Dubai'); Crosstab query : select *

GROUP BY consecutive dates delimited by gaps

阅读更多关于 GROUP BY consecutive dates delimited by gaps

问题 Assume you have (in Postgres 9.1 ) a table like this: date | value which have some gaps in it (I mean: not every possible date between min(date) and max(date) has it's row). My problem is how to aggregate this data so that each consistent group (without gaps) is treated separately, like this: min_date | max_date | [some aggregate of "value" column] Any ideas how to do it? I believe it is possible with window functions but after a while trying with lag() and lead() I'm a little stuck. For

Last_value window function doesn't work properly

阅读更多关于 Last_value window function doesn't work properly

问题 Last_value window function doesn't work properly. CREATE TABLE EXAMP2 ( CUSTOMER_ID NUMBER(38) NOT NULL, VALID_FROM DATE NOT NULL ); Customer_id Valid_from ------------------------------------- 9775 06.04.2013 01:34:16 9775 06.04.2013 20:34:00 9775 12.04.2013 11:07:01 -------------------------------------- select DISTINCT LAST_VALUE(VALID_FROM) OVER (partition by customer_id ORDER BY VALID_FROM ASC) rn from examp1; When I use LAST_VALUE then I get following rows: 06.04.2013 20:34:00 06.04

Calculating the Weighted Average Cost of products stock

阅读更多关于 Calculating the Weighted Average Cost of products stock

问题 I have to calculate my products stock cost, so for every product after each buy, i have to recalculate the Weighted Average Cost . I got a view thats bring me the current product's stock after each in/out: document_type document_date product_id qty_out qty_in price row_num stock_balance SI 01/01/2014 52 0 600 1037.28 1 600 SI 01/01/2014 53 0 300 1357.38 2 300 LC 03/02/2014 53 100 0 1354.16 3 200 LC 03/02/2014 53 150 0 1355.25 4 50 LC 03/02/2014 52 100 0 1035.26 5 500 LC 03/02/2014 52 200 0

User defined function to be applied to Window in PySpark?

阅读更多关于 User defined function to be applied to Window in PySpark?

问题 I am trying to apply a user defined function to Window in PySpark. I have read that UDAF might be the way to to go, but I was not able to find anything concrete. To give an example (taken from here: Xinh's Tech Blog and modified for PySpark): from pyspark import SparkConf from pyspark.sql import SparkSession from pyspark.sql.window import Window from pyspark.sql.functions import avg spark = SparkSession.builder.master("local").config(conf=SparkConf()).getOrCreate() a = spark.createDataFrame([

Ordered count of consecutive repeats / duplicates

阅读更多关于 Ordered count of consecutive repeats / duplicates

问题 I highly doubt I'm doing this in the most efficient manner, which is why I tagged plpgsql on here. I need to run this on 2 billion rows for a thousand measurement systems . You have measurement systems that often report the previous value when they lose connectivity, and they lose connectivity for spurts often but sometimes for a long time. You need to aggregate but when you do so, you need to look at how long it was repeating and make various filters based on that information. Say you are

Group by repeating attribute

阅读更多关于 Group by repeating attribute

问题 Basically I have a table messages , with user_id field that identifies a user that created the message. When I display a conversation(set of messages) between two users, I want to be able to group the messages by user_id , but in a tricky way: Let's say there are some messages (sorted by created_at desc ): id: 1, user_id: 1 id: 2, user_id: 1 id: 3, user_id: 2 id: 4, user_id: 2 id: 5, user_id: 1 I want to get 3 message groups in the below order: [1,2], [3,4], [5] It should group by *user_id*

Conditional lead/lag function PostgreSQL?

阅读更多关于 Conditional lead/lag function PostgreSQL?

问题 I have a table like this: Name activity time user1 A1 12:00 user1 E3 12:01 user1 A2 12:02 user2 A1 10:05 user2 A2 10:06 user2 A3 10:07 user2 M6 10:07 user2 B1 10:08 user3 A1 14:15 user3 B2 14:20 user3 D1 14:25 user3 D2 14:30 Now, I need a result like this: Name activity next_activity user1 A2 NULL user2 A3 B1 user3 A1 B2 I would like to check for every user the last activity from group A and what type of activity took place next from group B (activity from group B always takes place after

Calculating SQL Server ROW_NUMBER() OVER() for a derived table

阅读更多关于 Calculating SQL Server ROW_NUMBER() OVER() for a derived table

问题 In some other databases (e.g. DB2, or Oracle with ROWNUM ), I can omit the ORDER BY clause in a ranking function's OVER() clause. For instance: ROW_NUMBER() OVER() This is particularly useful when used with ordered derived tables, such as: SELECT t.*, ROW_NUMBER() OVER() FROM ( SELECT ... ORDER BY ) t How can this be emulated in SQL Server? I've found people using this trick, but that's wrong, as it will behave non-deterministically with respect to the order from the derived table: -- This