问题
I would like to get multiple rolling period means and std for several columns simultaneously.
This is the code I am using for rolling(5):
def add_mean_std_cols(df):
res = df.rolling(5).agg(['mean','std'])
res.columns = res.columns.map('_'.join)
cols = np.concatenate(list(zip(df.columns, res.columns[0::2], res.columns[1::2])))
final = res.join(df).loc[:, cols]
return final
I would like to the get rolling (5), (15), (30), (45) periods on the same operation.
I thought about iterating over periods but do not know how to avoid getting the rolling mean/std of the rolling mean/std...
回答1:
I would suggest creating a DataFrame with a MultiIndex as its columns. There's no way around using a loop here to iterate over your windows. The resulting form will be something that's easy to index and easy to read with pd.read_csv. Initialize an empty DataFrame with np.empty of the appropriate shape and use .loc to assign its values.
import numpy as np
import pandas as pd
np.random.seed(123)
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats],
names=['window', 'feature', 'metric'])
df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
index=df.index)
for window in windows:
df2.loc[:, window] = df.rolling(window=window).agg(stats).values
Now you have a result df2 that has the same index as your original object. It has 3 column levels: the first is the window, the second is the columns from your original frame, and the third is the statistic.
print(df2.shape)
(100, 24)
This makes it easy to check values for a specific rolling window:
print(df2[5]) # Rolling window = 5
feature col0 col1 col2
metric mean std mean std mean std
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 -0.87879 1.45348 -0.26559 0.71236 0.53233 0.89430
.. ... ... ... ... ... ...
95 -0.44231 1.02552 -1.22138 0.45140 -0.36440 0.95324
96 -0.58638 1.10246 -0.90165 0.79723 -0.44543 1.00166
97 -0.70564 0.85711 -0.42644 1.07174 -0.44766 1.00284
98 -0.95702 1.01302 -0.03705 1.05066 0.16437 1.32341
99 -0.57026 1.10978 0.08730 1.02438 0.39930 1.31240
print(df2[5]['col0']) # Rolling window = 5, stats of col0 only
metric mean std
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 -0.87879 1.45348
.. ... ...
95 -0.44231 1.02552
96 -0.58638 1.10246
97 -0.70564 0.85711
98 -0.95702 1.01302
99 -0.57026 1.10978
print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,
# means of each column
period 5
feature col0 col1 col2
metric mean mean mean
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 -0.87879 -0.26559 0.53233
.. ... ... ...
95 -0.44231 -1.22138 -0.36440
96 -0.58638 -0.90165 -0.44543
97 -0.70564 -0.42644 -0.44766
98 -0.95702 -0.03705 0.16437
99 -0.57026 0.08730 0.39930
And lastly to make a single-indexed DataFrame, here's some kludgy use of itertools.
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
import itertools
means = [col + '_mean' for col in df.columns]
stds = [col + '_std' for col in df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() for it in itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) for win in windows]))
iters = ['_'.join(it) for it in iters]
df2 = [df.rolling(window=window).agg(stats).values for window in windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
index=df.index)
回答2:
You can concatenate output of multiple rolling aggregations:
windows = (5, 15, 30, 45)
rolling_dfs = (df.rolling(i) # 1. Create window
.agg(['mean', 'std']) # 1. Aggregate
.rename_axis({col: '{0}_{1:d}'.format(col, i)
for col in df.columns}, axis=1) # 2. Rename columns
for i in windows) # For each window
pd.concat((df, *rolling_dfs), axis=1) # 3. Concatenate dataframes
This is not pretty but should do what you're looking for from what I understand.
What it does:
- creates a generator
rolling_dfswith the aggregated dataframes for each rolling window size. - renames all columns so you can know which rolling window size it refers to.
- concatenates the original
dfwith the rolling windows.
来源:https://stackoverflow.com/questions/46144352/pandas-multiple-rolling-periods