rolling-computation

pandas rolling() function with monthly offset

一个人想着一个人 提交于 2021-02-20 09:27:25
问题 I'm trying to use the rolling() function on a pandas data frame with monthly data. However, I dropped some NaN values, so now there are some gaps in my time series. Therefore, the basic window parameter gives a misleading answer since it just looks at the previous observation: import pandas as pd import numpy as np import random dft = pd.DataFrame(np.random.randint(0,10,size=len(dt)),index=dt) dft.columns = ['value'] dft['value'] = np.where(dft['value'] < 3,np.nan,dft['value']) dft = dft

Apply rolling function on pandas dataframe with multiple arguments

假如想象 提交于 2021-02-19 16:35:53
问题 I am trying to apply a rolling function, with a 3 year window, on a pandas dataframe. import pandas as pd # Dummy data df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018], 'IB': [2, 5, 8, 10, 7, 5, 10, 14], 'OB': [5, 8, 10, 12, 5, 10, 14, 20], 'Delta': [2, 2, 1, 3, -1, 3, 2, 4]}) # The function to be applied def get_ln_rate(ib, ob, delta): n_years = len(ib) return sum(delta)*np.log(ob[-1]/ib[0]) / (n_years * (ob[-1]

non fixed rolling window

亡梦爱人 提交于 2021-02-16 21:09:40
问题 I am looking to implement a rolling window on a list, but instead of a fixed length of window, I would like to provide a rolling window list: Something like this: l1 = [5, 3, 8, 2, 10, 12, 13, 15, 22, 28] l2 = [1, 2, 2, 2, 3, 4, 2, 3, 5, 3] get_custom_roling( l1, l2, np.average) and the result would be: [5, 4, 5.5, 5, 6.67, ....] 6.67 is calculated as average of 3 elements 10, 2, 8. I implemented a slow solution, and every idea is welcome to make it quicker :): import numpy as np def get_the

non fixed rolling window

末鹿安然 提交于 2021-02-16 21:08:51
问题 I am looking to implement a rolling window on a list, but instead of a fixed length of window, I would like to provide a rolling window list: Something like this: l1 = [5, 3, 8, 2, 10, 12, 13, 15, 22, 28] l2 = [1, 2, 2, 2, 3, 4, 2, 3, 5, 3] get_custom_roling( l1, l2, np.average) and the result would be: [5, 4, 5.5, 5, 6.67, ....] 6.67 is calculated as average of 3 elements 10, 2, 8. I implemented a slow solution, and every idea is welcome to make it quicker :): import numpy as np def get_the

non fixed rolling window

﹥>﹥吖頭↗ 提交于 2021-02-16 21:08:33
问题 I am looking to implement a rolling window on a list, but instead of a fixed length of window, I would like to provide a rolling window list: Something like this: l1 = [5, 3, 8, 2, 10, 12, 13, 15, 22, 28] l2 = [1, 2, 2, 2, 3, 4, 2, 3, 5, 3] get_custom_roling( l1, l2, np.average) and the result would be: [5, 4, 5.5, 5, 6.67, ....] 6.67 is calculated as average of 3 elements 10, 2, 8. I implemented a slow solution, and every idea is welcome to make it quicker :): import numpy as np def get_the

duplicating records between date gaps within a selected time interval in a PySpark dataframe

守給你的承諾、 提交于 2021-02-08 09:45:10
问题 I have a PySpark dataframe that keeps track of changes that occur in a product's price and status over months. This means that a new row is created only when a change occurred (in either status or price) compared to the previous month, like in the dummy data below ---------------------------------------- |product_id| status | price| month | ---------------------------------------- |1 | available | 5 | 2019-10| ---------------------------------------- |1 | available | 8 | 2020-08| ------------

Rolling average calculating some values it shouldn't?

我的未来我决定 提交于 2021-01-29 14:00:02
问题 Going off my question here I was redirected to another thread and was able to manipulate the code presented in that answer to get to where I want to be. I'm running into one problem now though and I'm a bit confused as to how it's coming about. My dataframe in essence looks as follows: Date HomeTeam AwayTeam HGoals AGoals HGRollA AGRollA 1/1 AAA BBB 4 2 2.67 1.67 Link to a more detailed image of said dataframe with some extra columns. Basically, every row has: -the date of the match -the home

R - Rolling sum of two columns in data.table

故事扮演 提交于 2021-01-25 07:31:16
问题 I have a data.table as follows - dt = data.table( date = seq(as.Date("2015-12-01"), as.Date("2015-12-10"), by="days"), v1 = c(seq(1, 9), 20), v2 = c(5, rep(NA, 9)) ) dt date v1 v2 1: 2015-12-01 1 5 2: 2015-12-02 2 NA 3: 2015-12-03 3 NA 4: 2015-12-04 4 NA 5: 2015-12-05 5 NA 6: 2015-12-06 6 NA 7: 2015-12-07 7 NA 8: 2015-12-08 8 NA 9: 2015-12-09 9 NA 10: 2015-12-10 20 NA Question 1: I want to add the current row value of v1 with the previous row value of v2 so the output looks like the following

Rolling idxmin/max for pandas DataFrame

随声附和 提交于 2021-01-24 06:56:32
问题 I believe the following function is a working solution for pandas DataFrame rolling argmin/max: import numpy as np def data_frame_rolling_arg_func(df, window_size, func): ws = window_size wm1 = window_size - 1 return (df.rolling(ws).apply(getattr(np, f'arg{func}'))[wm1:].astype(int) + np.array([np.arange(len(df) - wm1)]).T).applymap( lambda x: df.index[x]).combine_first(df.applymap(lambda x: np.NaN)) It is inspired from a partial solution for rolling idxmax on pandas Series. Explanations:

BaseIndexer still broken with .count() and .min()?

大城市里の小女人 提交于 2021-01-07 02:48:43
问题 I use a custom window function in pandas. This works fine for things like .mean() and .sum() . As I understand it other aggregations like .count() and .min() used to have problems but should be fixed. Currently .count() uses the internal roll_count function AFAICS. But I still don't get the expected results: import numpy as np import pandas as pd # Use largest most recent multiple of *modulo* past measurements: class ModuloIndexer(pd.api.indexers.BaseIndexer): def get_window_bounds(self, num