问题
I have a dataframe that looks like in the pictures.
Now, I want to add a new column that will show the average of power for each day (given the data is sampled every 5 minutes), but separately for when it is day_or_night (day = 0 in the column, night = 1). I've gotten this far:
train['avg_by_day'][train['day_or_night']==1] = train['power'][train['day_or_night']==1].mean()
train['avg_by_day'][train['day_or_night']==0] = train['power'][train['day_or_night']==0].mean()
but this just adds the average of all the power values that correspond to day, or similarly - night, which isn't what I'm after: a specific average for each day/night separately.
I need something like: train['avg_by_day'] == train.power.mean() when day == 1 and day_or_night == 1, and this for each day.
回答1:
So you want to group the dataframe by day and day_or_night and create a new column with mean power values for each group:
train['avg_by_day'] = train.groupby(['day','day_or_night'])['power']\
.transform('mean')
Maybe you should also include year and month in the grouping columns because otherwise it's going to group the 1st day of every month together, same for the 2nd day and so on.
来源:https://stackoverflow.com/questions/43306199/calculating-mean-of-a-specific-column-by-specific-rows