python: use agg with more than one customized function

元气小坏坏 提交于 2020-03-28 06:39:13

问题


I have a data frame like this.

mydf = pd.DataFrame({'a':[1,1,3,3],'b':[np.nan,2,3,6],'c':[1,3,3,9]})

   a    b  c
0  1  NaN  1
1  1  2.0  3
2  3  3.0  3
3  3  6.0  9

I would like to have a resulting dataframe like this.

myResults = pd.concat([mydf.groupby('a').apply(lambda x: (x.b/x.c).max()), mydf.groupby('a').apply(lambda x: (x.b/x.c).min())], axis =1)
myResults.columns = ['max','min']

        max       min
a
1  0.666667  0.666667
3  1.000000  0.666667

Basically i would like to have max and min of ratio of column b and column c for each group (grouped by column a)

If it possible to achieve this by agg? I tried mydf.groupby('a').agg([lambda x: (x.b/x.c).max(), lambda x: (x.b/x.c).min()]). It will not work, and seems column name b and c will not be recognized.

Another way i can think of is to add the ratio column first to mydf. i.e. mydf['ratio'] = mydf.b/mydf.c, and then use agg on the updated mydf like mydf.groupby('a')['ratio'],agg[max,min].

Is there a better way to achieve this through agg or other function? In summary, I would like to apply customized function to grouped DataFrame, and the customized function needs to read multiple columns from original DataFrame.


回答1:


You can use a customized function to acheive this.

You can create any number of new columns using any input columns using the below function.

def f(x):
    t = {}
    t['max'] = (x['b']/x['c']).max()
    t['min'] = (x['b']/x['c']).min()
    return pd.Series(t)

mydf.groupby('a').apply(f)

Output:

        max       min
a                    
1  0.666667  0.666667
3  1.000000  0.666667


来源:https://stackoverflow.com/questions/60196551/python-use-agg-with-more-than-one-customized-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!