Pandas: Multiple rolling periods

问题

I would like to get multiple rolling period means and std for several columns simultaneously.

This is the code I am using for rolling(5):

def add_mean_std_cols(df):
    res = df.rolling(5).agg(['mean','std'])

    res.columns = res.columns.map('_'.join)

    cols = np.concatenate(list(zip(df.columns, res.columns[0::2], res.columns[1::2])))

    final = res.join(df).loc[:, cols]
    return final

I would like to the get rolling (5), (15), (30), (45) periods on the same operation.

I thought about iterating over periods but do not know how to avoid getting the rolling mean/std of the rolling mean/std...

回答1:

I would suggest creating a DataFrame with a MultiIndex as its columns. There's no way around using a loop here to iterate over your windows. The resulting form will be something that's easy to index and easy to read with pd.read_csv. Initialize an empty DataFrame with np.empty of the appropriate shape and use .loc to assign its values.

import numpy as np
import pandas as pd
np.random.seed(123)

df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')

windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats], 
                                  names=['window', 'feature', 'metric'])

df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
                   index=df.index)

for window in windows:
    df2.loc[:, window] = df.rolling(window=window).agg(stats).values

Now you have a result df2 that has the same index as your original object. It has 3 column levels: the first is the window, the second is the columns from your original frame, and the third is the statistic.

print(df2.shape)
(100, 24)

This makes it easy to check values for a specific rolling window:

print(df2[5])  # Rolling window = 5
feature     col0              col1              col2         
metric      mean      std     mean      std     mean      std
0            NaN      NaN      NaN      NaN      NaN      NaN
1            NaN      NaN      NaN      NaN      NaN      NaN
2            NaN      NaN      NaN      NaN      NaN      NaN
3            NaN      NaN      NaN      NaN      NaN      NaN
4       -0.87879  1.45348 -0.26559  0.71236  0.53233  0.89430
..           ...      ...      ...      ...      ...      ...
95      -0.44231  1.02552 -1.22138  0.45140 -0.36440  0.95324
96      -0.58638  1.10246 -0.90165  0.79723 -0.44543  1.00166
97      -0.70564  0.85711 -0.42644  1.07174 -0.44766  1.00284
98      -0.95702  1.01302 -0.03705  1.05066  0.16437  1.32341
99      -0.57026  1.10978  0.08730  1.02438  0.39930  1.31240

print(df2[5]['col0'])  # Rolling window = 5, stats of col0 only
metric     mean      std
0           NaN      NaN
1           NaN      NaN
2           NaN      NaN
3           NaN      NaN
4      -0.87879  1.45348
..          ...      ...
95     -0.44231  1.02552
96     -0.58638  1.10246
97     -0.70564  0.85711
98     -0.95702  1.01302
99     -0.57026  1.10978

print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,
                                            # means of each column
period         5                  
feature     col0     col1     col2
metric      mean     mean     mean
0            NaN      NaN      NaN
1            NaN      NaN      NaN
2            NaN      NaN      NaN
3            NaN      NaN      NaN
4       -0.87879 -0.26559  0.53233
..           ...      ...      ...
95      -0.44231 -1.22138 -0.36440
96      -0.58638 -0.90165 -0.44543
97      -0.70564 -0.42644 -0.44766
98      -0.95702 -0.03705  0.16437
99      -0.57026  0.08730  0.39930

And lastly to make a single-indexed DataFrame, here's some kludgy use of itertools.

df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')

import itertools

means = [col + '_mean' for col in df.columns]
stds = [col + '_std' for col in df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() for it in itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) for win in windows]))
iters = ['_'.join(it) for it in iters]

df2 = [df.rolling(window=window).agg(stats).values for window in windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
                   index=df.index)

回答2:

You can concatenate output of multiple rolling aggregations:

windows = (5, 15, 30, 45)
rolling_dfs = (df.rolling(i)                                    # 1. Create window
                 .agg(['mean', 'std'])                          # 1. Aggregate
                 .rename_axis({col: '{0}_{1:d}'.format(col, i)
                               for col in df.columns}, axis=1)  # 2. Rename columns
               for i in windows)                                # For each window

pd.concat((df, *rolling_dfs), axis=1)                           # 3. Concatenate dataframes

This is not pretty but should do what you're looking for from what I understand.

What it does:

creates a generator rolling_dfs with the aggregated dataframes for each rolling window size.
renames all columns so you can know which rolling window size it refers to.
concatenates the original df with the rolling windows.

来源：https://stackoverflow.com/questions/46144352/pandas-multiple-rolling-periods

标签

python

python-2.7

pandas