Pandas Groupby and apply method with custom function

问题

I built the following function with the aim of estimating an optimal exponential moving average of a pandas' DataFrame column.

from scipy import optimize
from sklearn.metrics import mean_squared_error
import pandas as pd
## Function that finds best alpha and uses it to create ewma
def find_best_ewma(series, eps=10e-5):

    def f(alpha):
        ewm = series.shift().ewm(alpha=alpha, adjust=False).mean()
        return mean_squared_error(series, ewm.fillna(0))

    result = optimize.minimize(f,.3, bounds=[(0+eps, 1-eps)])

    return series.shift().ewm(alpha=result.x, adjust=False).mean()

Now I want to apply this function to each of the groups created using pandas-groupby on the following test df:

## test
      data1     data2 key1 key2
0 -0.018442 -1.564270    a    x
1 -0.038490 -1.504290    b    x
2  0.953920 -0.283246    a    x
3 -0.231322 -0.223326    b    y
4 -0.741380  1.458798    c    z
5 -0.856434  0.443335    d    y
6 -1.416564  1.196244    c    z

To do so, I tried the following two ways:

## First way
test.groupby(["key1","key2"])["data1"].apply(find_best_ewma)
## Output
0         NaN
1         NaN
2   -0.018442
3         NaN
4         NaN
5         NaN
6   -0.741380
Name: data1, dtype: float64

## Second way
test.groupby(["key1","key2"]).apply(lambda g: find_best_ewma(g["data1"]))
## Output
key1  key2   
a     x     0         NaN
            2   -0.018442
b     x     1         NaN
      y     3         NaN
c     z     4         NaN
            6   -0.741380
d     y     5         NaN
Name: data1, dtype: float64

Both ways produce a pandas.core.series.Series but ONLY the second way provides the expected hierarchical index.

I do not understand why the first way does not produce the hierarchical index and instead returns the original dataframe index. Could you please explain me why this happens?

What am I missing?

Thanks in advance for your help.

回答1:

The first way creates a pandas.core.groupby.DataFrameGroupBy object, which becomes a pandas.core.groupby.SeriesGroupBy object once you select a specific column from it; It is to this object that the 'apply' method is applied to, hence a series is returned.

test.groupby(["key1","key2"])["data1"]#.apply(find_best_ewma)
<pandas.core.groupby.SeriesGroupBy object at 0x7fce51fac790>

The second way remains a DataFrameGroupBy object. The function you apply to that object selects the column, which means the function 'find_best_ewma' is applied to each member of that column, but the 'apply' method is applied to the original DataFrameGroupBy, hence a DataFrame is returned, the 'magic' is that the indexes of the DataFrame are hence still present.

来源：https://stackoverflow.com/questions/49500560/pandas-groupby-and-apply-method-with-custom-function

标签

python

pandas

pandas-groupby