cumulative-sum | 易学教程

Get cumulative count per 2d array

阅读更多关于 Get cumulative count per 2d array

I have general data, e.g. strings: np.random.seed(343) arr = np.sort(np.random.randint(5, size=(10, 10)), axis=1).astype(str) print (arr) [['0' '1' '1' '2' '2' '3' '3' '4' '4' '4'] ['1' '2' '2' '2' '3' '3' '3' '4' '4' '4'] ['0' '2' '2' '2' '2' '3' '3' '4' '4' '4'] ['0' '1' '2' '2' '3' '3' '3' '4' '4' '4'] ['0' '1' '1' '1' '2' '2' '2' '2' '4' '4'] ['0' '0' '1' '1' '2' '3' '3' '3' '4' '4'] ['0' '0' '2' '2' '2' '2' '2' '2' '3' '4'] ['0' '0' '1' '1' '1' '2' '2' '2' '3' '3'] ['0' '1' '1' '2' '2' '2' '3' '4' '4' '4'] ['0' '1' '1' '2' '2' '2' '2' '2' '4' '4']] I need count with reset if difference

Get cumulative count per 2d array

阅读更多关于 Get cumulative count per 2d array

问题 I have general data, e.g. strings: np.random.seed(343) arr = np.sort(np.random.randint(5, size=(10, 10)), axis=1).astype(str) print (arr) [['0' '1' '1' '2' '2' '3' '3' '4' '4' '4'] ['1' '2' '2' '2' '3' '3' '3' '4' '4' '4'] ['0' '2' '2' '2' '2' '3' '3' '4' '4' '4'] ['0' '1' '2' '2' '3' '3' '3' '4' '4' '4'] ['0' '1' '1' '1' '2' '2' '2' '2' '4' '4'] ['0' '0' '1' '1' '2' '3' '3' '3' '4' '4'] ['0' '0' '2' '2' '2' '2' '2' '2' '3' '4'] ['0' '0' '1' '1' '1' '2' '2' '2' '3' '3'] ['0' '1' '1' '2' '2'

Running sum on a column conditional on value

阅读更多关于 Running sum on a column conditional on value

I have a vector of binary variables which state whether a product is on promotion in the period. I'm trying to work out how to calculate the duration of each promotion and the duration between promotions. promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0)) So in other words: if promo.flag is same as previous period then running.total + 1 , else running.total is reset to 1 I've tried playing with apply functions and cumsum but can't manage to get the conditional reset of running total working :-( The output I need is: promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0) rolling.sum = c(1,2,1,1,1,2,1,2,3,1,1,2,0)

Pyspark - Cumulative sum with reset condition

阅读更多关于 Pyspark - Cumulative sum with reset condition

问题 I have this dataframe +---+----+---+ | A| B| C| +---+----+---+ | 0|null| 1| | 1| 3.0| 0| | 2| 7.0| 0| | 3|null| 1| | 4| 4.0| 0| | 5| 3.0| 0| | 6|null| 1| | 7|null| 1| | 8|null| 1| | 9| 5.0| 0| | 10| 2.0| 0| | 11|null| 1| +---+----+---+ What I need do is a cumulative sum of values from column C until the next value is zero, then reset the cumulative sum, doing this until finish all rows. Expected output: +---+----+---+----+ | A| B| C| D| +---+----+---+----+ | 0|null| 1| 1| | 1| 3.0| 0| 0| | 2|

SQL Server - Cumulative Sum that resets when 0 is encountered

阅读更多关于 SQL Server - Cumulative Sum that resets when 0 is encountered

I would like to do a cumulative sum on a column, but reset the aggregated value whenever a 0 is encountered Here is an example of what i try to do : This dataset : pk price 1 10 2 15 3 0 4 10 5 5 Gives this: pk price 1 10 2 25 3 0 4 10 5 15 In SQL Server 2008, you are severely limited because you cannot use analytic functions. The following is not efficient, but it will solve your problem: with tg as ( select t.*, g.grp from t cross apply (select count(*) as grp from t t2 where t2.pk <= t.pk and t2.pk = 0 ) g ) select tg.*, p.running_price from tg cross apply (select sum(tg2.price) as running

Running sum on a column conditional on value

阅读更多关于 Running sum on a column conditional on value

问题 I have a vector of binary variables which state whether a product is on promotion in the period. I'm trying to work out how to calculate the duration of each promotion and the duration between promotions. promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0)) So in other words: if promo.flag is same as previous period then running.total + 1 , else running.total is reset to 1 I've tried playing with apply functions and cumsum but can't manage to get the conditional reset of running total working :-( The

Creating a cumulative step graph in R

阅读更多关于 Creating a cumulative step graph in R

Say I have this example data frame set.seed(12345) n1 <- 3 n2 <- 10 n3 <- 60 times <- seq(0, 100, 0.5) individual <- c(rep(1, n1), rep(2, n2), rep(3, n3)) events <- c(sort(sample(times, n1)), sort(sample(times, n2)), sort(sample(times, n3))) df <- data.frame(individual = individual, events = events) Which gives > head(df, 10) individual events 1 1 72.0 2 1 75.5 3 1 87.5 4 2 3.0 5 2 14.5 6 2 16.5 7 2 32.0 8 2 45.5 9 2 50.0 10 2 70.5 I would like to plot a cumulative step graph of the events so that I get one line per individual which goes up by 1 each time an event is "encountered". So, for

Pyspark : Cumulative Sum with reset condition

阅读更多关于 Pyspark : Cumulative Sum with reset condition

We have dataframe like below : +------+--------------------+ | Flag | value| +------+--------------------+ |1 |5 | |1 |4 | |1 |3 | |1 |5 | |1 |6 | |1 |4 | |1 |7 | |1 |5 | |1 |2 | |1 |3 | |1 |2 | |1 |6 | |1 |9 | +------+--------------------+ After normal cumsum we get this. +------+--------------------+----------+ | Flag | value|cumsum | +------+--------------------+----------+ |1 |5 |5 | |1 |4 |9 | |1 |3 |12 | |1 |5 |17 | |1 |6 |23 | |1 |4 |27 | |1 |7 |34 | |1 |5 |39 | |1 |2 |41 | |1 |3 |44 | |1 |2 |46 | |1 |6 |52 | |1 |9 |61 | +------+--------------------+----------+ Now what we want is for

Pyspark : Cumulative Sum with reset condition

阅读更多关于 Pyspark : Cumulative Sum with reset condition

问题 We have dataframe like below : +------+--------------------+ | Flag | value| +------+--------------------+ |1 |5 | |1 |4 | |1 |3 | |1 |5 | |1 |6 | |1 |4 | |1 |7 | |1 |5 | |1 |2 | |1 |3 | |1 |2 | |1 |6 | |1 |9 | +------+--------------------+ After normal cumsum we get this. +------+--------------------+----------+ | Flag | value|cumsum | +------+--------------------+----------+ |1 |5 |5 | |1 |4 |9 | |1 |3 |12 | |1 |5 |17 | |1 |6 |23 | |1 |4 |27 | |1 |7 |34 | |1 |5 |39 | |1 |2 |41 | |1 |3 |44 |

How do I do a conditional sum which only looks between certain date criteria

阅读更多关于 How do I do a conditional sum which only looks between certain date criteria

问题 Say I have data that looks like date, user, items_bought, event_number 2013-01-01, x, 2, 1 2013-01-02, x, 1, 2 2013-01-03, x, 0, 3 2013-01-04, x, 0, 4 2013-01-04, x, 1, 5 2013-01-04, x, 2, 6 2013-01-05, x, 3, 7 2013-01-06, x, 1, 8 2013-01-01, y, 1, 1 2013-01-02, y, 1, 2 2013-01-03, y, 0, 3 2013-01-04, y, 5, 4 2013-01-05, y, 6, 5 2013-01-06, y, 1, 6 to get the cumulative sum per user per data point I was doing data.frame(cum_items_bought=unlist(tapply(as.numeric(data$items_bought), data$user,