问题
My data is stored in df. I have multiple users per group. I want to group df by group and apply different functions to different columns. The twist is that I would like to assign custom names to the new columns during this process.
np.random.seed(123)
df = pd.DataFrame({"user":range(4),"group":[1,1,2,2],"crop":["2018-01-01","2018-01-01","2018-03-01","2018-03-01"],
"score":np.random.randint(400,1000,4)})
df["crop"] = pd.to_datetime(df["crop"])
print(df)
user group crop score
0 0 1 2018-01-01 910
1 1 1 2018-01-01 765
2 2 2 2018-03-01 782
3 3 2 2018-03-01 722
I want to get the mean of score, and the min and max values of crop grouped by group and assign custom names to each new column. The desired output should look like this:
group mean_score min_crop max_crop
0 1 837.5 2018-01-01 2018-01-01
1 2 752.0 2018-03-01 2018-03-01
I don't know how to do this in a one-liner in Python. In R, I would use data.table and get the following:
df[, list(mean_score = mean(score),
max_crop = max(crop),
min_crop = min(crop)), by = group]
I know I could group the data and use .agg combined with a dictionary. Is there an alternative way where I can custom-name each column in this process?
回答1:
Try creating a function with the required operations using groupby().apply():
def f(x):
d = {}
d['mean_score'] = x['score'].mean()
d['min_crop'] = x['crop'].min()
d['max_crop'] = x['crop'].max()
return pd.Series(d, index=['mean_score', 'min_crop', 'max_crop'])
data = df.groupby('group').apply(f)
来源:https://stackoverflow.com/questions/58326349/apply-different-functions-to-different-columns-with-a-singe-pandas-groupby-comma