window-functions

How to subtract from previous row result?

a 夏天 提交于 2019-12-12 06:57:39
问题 I need to subtract first row lot_size + substraction result from currentitems column. If no balance left then it should be 0. bal is how result column should look. rowno | location | lot_size | currentitems | bal | bal_left -------+----------+----------+--------------+--------+-------- 1 | AB1210 | 1200 | 1000 | 1000 | 200 2 | AB1220 | 1200 | 1000 | 200 | 0 3 | AB1230 | 1200 | 500 | 0 | 0 Current approach (using postgresql 9.3.1): SELECT row_number() over (ORDER BY location) as rowno,

Mysterious error when combining lead function, second window function and column alias

一世执手 提交于 2019-12-12 06:00:12
问题 Consider the following query: select corpus_date as alias ,lead(word, 1) over (partition by corpus order by word_count desc) lead ,max(word_count) over (partition by corpus) max_word_count from [publicdata:samples.shakespeare] where corpus='othello' and length(word) > 10 limit 5 This gives me the error message Field 'alias' not found. But alias is only used as an alias in this query. Note also that the error disappears if I comment out either the alias, or the lead function or the min

Spark Couldn't Find Window Function

浪尽此生 提交于 2019-12-12 02:46:06
问题 Using the solution provided in https://stackoverflow.com/a/32407543/5379015 I tried to recreate the same query but using the programmatic syntax in stead of the Dataframe API as follows: import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.expressions.Window import org.apache.spark.sql.functions._ object HiveContextTest { def main(args: Array[String]) { val conf = new SparkConf().setAppName("HiveContextTest") val sc = new

How to get new and returned YTD,MTD,WTD users in a user traffic table?

北城余情 提交于 2019-12-12 01:41:42
问题 I would like to get all the new and returned users from a user_traffic table. Just wondering what would be the approach in solving this problem. Any thoughts/inputs would be appreciated. I am not expecting a ready made solution for this problem but any kind of directional inputs would help me. Thank you. user_traffic : Session_ID, session_day, glance_view_id, user_id, product_id. Sample Input :- create table user_traffic (session_id number(6), session_day date, user_id number(6), product_id

Given time/interval to calculate open/high/low/close value in each grouped data

旧城冷巷雨未停 提交于 2019-12-11 20:22:32
问题 Suppose raw data is: Timestamp High Low Volume 10:24.22345 100 99 10 10:24.23345 110 97 20 10:24.33455 97 89 40 10:25.33455 60 40 50 10:25.93455 40 20 60 With a sample time of 1 second, the output data should be as following (they are grouped by second): Timestamp Open Close High Low Volume 10:24 82 83 110 89 70 10:25 50 40 60 20 110 Open means the price of the earliest data in the group Close means the price of the lastest data in the group Volume means the sum(Volume) in the group The

Hive - over (partition by …) with a column not in group by

情到浓时终转凉″ 提交于 2019-12-11 19:06:55
问题 Is it possible to do something like: select avg(count(distinct user_id)) over (partition by some_date) as average_users_per_day from user_activity group by user_type (notably, the partition by column, some_date , is not in the group by columns) The idea I'm going for is something like: the average users per day by user type . I know how to do it using subqueries (see below), but I'd like to know if there is a nice way using only over (partition by ...) and group by . Notes: From reading this

Calculating Running Total with OVER Clause and PARTITION BY Clause with counter

半城伤御伤魂 提交于 2019-12-11 18:38:40
问题 how to calculate the sum of points each week_number and user_name using a counter ? I use sum() OVER (PARTITION BY ORDER BY ) And I would like to have the maximum sum at 10. How can I do this? Here is my original table: Here is the result I would like to obtain: Here is the sql code: SELECT week_number, user_name, sum(points) OVER (PARTITION BY user_name ORDER BY week_number) AS total_prime FROM team; 回答1: Try this select week_number, user_name, points, case when total_prime > 10 then 10 else

Apply a ranking window function in dbplyr backend

不羁的心 提交于 2019-12-11 17:34:53
问题 I want to seamlessly identify new orders (acquisitions) and returns in my transactional database table. This sounds like the perfect job for a window function; I would like to perform this operation in dbplyr . My current process is to: Create a query object I then use into dbGetQuery() ; this query contains a standard rank() window function as usually seen in postgresql Ingest this query into my R environment Then using an ifelse() function into the mutate() verb, I identify the first orders

Lag colum by group in dplyr [closed]

懵懂的女人 提交于 2019-12-11 17:28:48
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I have data frame mydata such as the following: col1 col2 1 1 1 2 1 2 3 1 3 4 2 1 5 2 2 6 2 3 Y want to lag col2 within groups in col1 , so my expected result would be as follwing: col1 col2 1 1 NA 2 1 1 3 1 2 4 2 NA 5 2 1 6 2 2 Follwing the procedure from [this answer][1] I try with_lagged_col2 = mydata %>% group

Unexpected behavior of window function first_value

☆樱花仙子☆ 提交于 2019-12-11 15:41:28
问题 I have 2 columns - order no, value. Table value constructor: (1, null) ,(2, 5) ,(3, null) ,(4, null) ,(5, 2) ,(6, 1) I need to get (1, 5) -- i.e. first nonnull Value if I go from current row and order by OrderNo ,(2, 5) ,(3, 2) -- i.e. first nonnull Value if I go from current row and order by OrderNo ,(4, 2) -- analogous ,(5, 2) ,(6, 1) This is query that I think should work. ;with SourceTable as ( select * from (values (1, null) ,(2, 5) ,(3, null) ,(4, null) ,(5, 2) ,(6, 1) ) as T(OrderNo,