Panda dataframe conditional .mean() depending on values in certain column

前端 未结 2 1783
终归单人心
终归单人心 2020-12-31 14:08

I\'m trying to create a new column which returns the mean of values from an existing column in the same df. However the mean should be computed based on a grouping in three

2条回答
  •  北海茫月
    2020-12-31 14:32

    Here's one way to do it

    In [19]: def cust_mean(grp):
       ....:     grp['mean'] = grp['option_value'].mean()
       ....:     return grp
       ....:
    
    In [20]: o2.groupby(['YEAR', 'daytype', 'hourtype']).apply(cust_mean)
    Out[20]:
       YEAR daytype hourtype  scenario  option_value       mean
    0  2015     SAT     of_h         0      0.134499  28.282946
    1  2015     SUN     of_h         1     63.019250  63.019250
    2  2015      WD     of_h         2     52.113516  52.113516
    3  2015      WD     pk_h         3     43.126513  43.126513
    4  2015     SAT     of_h         4     56.431392  28.282946
    

    So, what was going wrong with your attempt?

    It returns an aggregate with different shape from the original dataframe.

    In [21]: o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
    Out[21]:
    YEAR  daytype  hourtype
    2015  SAT      of_h        28.282946
          SUN      of_h        63.019250
          WD       of_h        52.113516
                   pk_h        43.126513
    Name: option_value, dtype: float64
    

    Or use transform

    In [1461]: o2['premium'] = (o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value']
                                  .transform('mean'))
    
    In [1462]: o2
    Out[1462]:
       YEAR daytype hourtype  scenario  option_value    premium
    0  2015     SAT     of_h         0      0.134499  28.282946
    1  2015     SUN     of_h         1     63.019250  63.019250
    2  2015      WD     of_h         2     52.113516  52.113516
    3  2015      WD     pk_h         3     43.126513  43.126513
    4  2015     SAT     of_h         4     56.431392  28.282946
    

提交回复
热议问题