Apply different functions to different columns with a singe pandas groupby command

问题

My data is stored in df. I have multiple users per group. I want to group df by group and apply different functions to different columns. The twist is that I would like to assign custom names to the new columns during this process.

np.random.seed(123)
df = pd.DataFrame({"user":range(4),"group":[1,1,2,2],"crop":["2018-01-01","2018-01-01","2018-03-01","2018-03-01"],
                   "score":np.random.randint(400,1000,4)})
df["crop"] = pd.to_datetime(df["crop"])
print(df)
   user  group        crop  score
0     0      1  2018-01-01    910
1     1      1  2018-01-01    765
2     2      2  2018-03-01    782
3     3      2  2018-03-01    722

I want to get the mean of score, and the min and max values of crop grouped by group and assign custom names to each new column. The desired output should look like this:

  group  mean_score    min_crop    max_crop
0     1       837.5  2018-01-01  2018-01-01
1     2       752.0  2018-03-01  2018-03-01

I don't know how to do this in a one-liner in Python. In R, I would use data.table and get the following:

df[, list(mean_score = mean(score),
          max_crop   = max(crop),
          min_crop   = min(crop)), by = group]

I know I could group the data and use .agg combined with a dictionary. Is there an alternative way where I can custom-name each column in this process?

回答1:

Try creating a function with the required operations using groupby().apply():

def f(x):
    d = {}
    d['mean_score'] = x['score'].mean()
    d['min_crop'] = x['crop'].min()
    d['max_crop'] = x['crop'].max()
    return pd.Series(d, index=['mean_score', 'min_crop', 'max_crop'])

data = df.groupby('group').apply(f)

来源：https://stackoverflow.com/questions/58326349/apply-different-functions-to-different-columns-with-a-singe-pandas-groupby-comma

标签

python

pandas

group-by