问题
I have a data frame like this.
mydf = pd.DataFrame({'a':[1,1,3,3],'b':[np.nan,2,3,6],'c':[1,3,3,9]})
a b c
0 1 NaN 1
1 1 2.0 3
2 3 3.0 3
3 3 6.0 9
I would like to have a resulting dataframe like this.
myResults = pd.concat([mydf.groupby('a').apply(lambda x: (x.b/x.c).max()), mydf.groupby('a').apply(lambda x: (x.b/x.c).min())], axis =1)
myResults.columns = ['max','min']
max min
a
1 0.666667 0.666667
3 1.000000 0.666667
Basically i would like to have max and min of ratio of column b and column c for each group (grouped by column a)
If it possible to achieve this by agg?
I tried mydf.groupby('a').agg([lambda x: (x.b/x.c).max(), lambda x: (x.b/x.c).min()]). It will not work, and seems column name b and c will not be recognized.
Another way i can think of is to add the ratio column first to mydf. i.e. mydf['ratio'] = mydf.b/mydf.c, and then use agg on the updated mydf like mydf.groupby('a')['ratio'],agg[max,min].
Is there a better way to achieve this through agg or other function? In summary, I would like to apply customized function to grouped DataFrame, and the customized function needs to read multiple columns from original DataFrame.
回答1:
You can use a customized function to acheive this.
You can create any number of new columns using any input columns using the below function.
def f(x):
t = {}
t['max'] = (x['b']/x['c']).max()
t['min'] = (x['b']/x['c']).min()
return pd.Series(t)
mydf.groupby('a').apply(f)
Output:
max min
a
1 0.666667 0.666667
3 1.000000 0.666667
来源:https://stackoverflow.com/questions/60196551/python-use-agg-with-more-than-one-customized-function