问题
I built the following function with the aim of estimating an optimal exponential moving average of a pandas' DataFrame column.
from scipy import optimize
from sklearn.metrics import mean_squared_error
import pandas as pd
## Function that finds best alpha and uses it to create ewma
def find_best_ewma(series, eps=10e-5):
def f(alpha):
ewm = series.shift().ewm(alpha=alpha, adjust=False).mean()
return mean_squared_error(series, ewm.fillna(0))
result = optimize.minimize(f,.3, bounds=[(0+eps, 1-eps)])
return series.shift().ewm(alpha=result.x, adjust=False).mean()
Now I want to apply this function to each of the groups created using pandas-groupby on the following test df:
## test
data1 data2 key1 key2
0 -0.018442 -1.564270 a x
1 -0.038490 -1.504290 b x
2 0.953920 -0.283246 a x
3 -0.231322 -0.223326 b y
4 -0.741380 1.458798 c z
5 -0.856434 0.443335 d y
6 -1.416564 1.196244 c z
To do so, I tried the following two ways:
## First way
test.groupby(["key1","key2"])["data1"].apply(find_best_ewma)
## Output
0 NaN
1 NaN
2 -0.018442
3 NaN
4 NaN
5 NaN
6 -0.741380
Name: data1, dtype: float64
## Second way
test.groupby(["key1","key2"]).apply(lambda g: find_best_ewma(g["data1"]))
## Output
key1 key2
a x 0 NaN
2 -0.018442
b x 1 NaN
y 3 NaN
c z 4 NaN
6 -0.741380
d y 5 NaN
Name: data1, dtype: float64
Both ways produce a pandas.core.series.Series but ONLY the second way provides the expected hierarchical index.
I do not understand why the first way does not produce the hierarchical index and instead returns the original dataframe index. Could you please explain me why this happens?
What am I missing?
Thanks in advance for your help.
回答1:
The first way creates a pandas.core.groupby.DataFrameGroupBy object, which becomes a pandas.core.groupby.SeriesGroupBy object once you select a specific column from it; It is to this object that the 'apply' method is applied to, hence a series is returned.
test.groupby(["key1","key2"])["data1"]#.apply(find_best_ewma)
<pandas.core.groupby.SeriesGroupBy object at 0x7fce51fac790>
The second way remains a DataFrameGroupBy object. The function you apply to that object selects the column, which means the function 'find_best_ewma' is applied to each member of that column, but the 'apply' method is applied to the original DataFrameGroupBy, hence a DataFrame is returned, the 'magic' is that the indexes of the DataFrame are hence still present.
来源:https://stackoverflow.com/questions/49500560/pandas-groupby-and-apply-method-with-custom-function