Apply different functions to different columns with a singe pandas groupby command

╄→尐↘猪︶ㄣ 提交于 2020-04-30 10:57:37

问题


My data is stored in df. I have multiple users per group. I want to group df by group and apply different functions to different columns. The twist is that I would like to assign custom names to the new columns during this process.

np.random.seed(123)
df = pd.DataFrame({"user":range(4),"group":[1,1,2,2],"crop":["2018-01-01","2018-01-01","2018-03-01","2018-03-01"],
                   "score":np.random.randint(400,1000,4)})
df["crop"] = pd.to_datetime(df["crop"])
print(df)
   user  group        crop  score
0     0      1  2018-01-01    910
1     1      1  2018-01-01    765
2     2      2  2018-03-01    782
3     3      2  2018-03-01    722

I want to get the mean of score, and the min and max values of crop grouped by group and assign custom names to each new column. The desired output should look like this:

  group  mean_score    min_crop    max_crop
0     1       837.5  2018-01-01  2018-01-01
1     2       752.0  2018-03-01  2018-03-01

I don't know how to do this in a one-liner in Python. In R, I would use data.table and get the following:

df[, list(mean_score = mean(score),
          max_crop   = max(crop),
          min_crop   = min(crop)), by = group]

I know I could group the data and use .agg combined with a dictionary. Is there an alternative way where I can custom-name each column in this process?


回答1:


Try creating a function with the required operations using groupby().apply():

def f(x):
    d = {}
    d['mean_score'] = x['score'].mean()
    d['min_crop'] = x['crop'].min()
    d['max_crop'] = x['crop'].max()
    return pd.Series(d, index=['mean_score', 'min_crop', 'max_crop'])

data = df.groupby('group').apply(f)


来源:https://stackoverflow.com/questions/58326349/apply-different-functions-to-different-columns-with-a-singe-pandas-groupby-comma

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!