pandas and groupby: how to calculate weighted averages within an agg

问题

I calculate a number of aggregate functions using groupby and agg , because I need different aggregate functions for different variables, e.g. not the sum of all, but sum and mean of x, mean of y, etc.

Is there a way to calculate a weighted average using agg? I have found lots of examples, but none with agg.

I can calculate the weighted average manually, as in the code below (note the lines with **), but I was wondering if there is a more elegant and direct way?

Can I create my own function and use that with agg?

For the sake of clarity, I fully understand there are other solutions, e.g.

Pandas DataFrame aggregate function using multiple columns
groupby weighted average and sum in pandas dataframe
Calculate weighted average with pandas dataframe

and lots, lots more. However, as I said, I am not sure how to implement these solutions with an agg, and I need agg because I need to apply different aggregate functions to different columns (again, not the sum of all, but sum and mean of x, mean of y, etc.).

Thanks!

import numpy as np
import pandas as pd
df= pd.DataFrame(np.random.randint(5,8,(1000,4)), columns=['a','b','c','d'])
**df['c * b']= df['c']* df['b']**
g = df.groupby('a').agg(
        {'b':['sum', lambda x: x.sum() / df['b'] .sum(), 'mean'],
              'c':['sum','mean'], 'd':['sum'],
              'c * b':['sum']})
g.columns = g.columns.map('_'.join)
**g['weighted average of c'] = g['c * b_sum'] / g['b_sum']**

回答1:

Is it possible, but really complicated:

np.random.seed(234)
df= pd.DataFrame(np.random.randint(5,8,(1000,4)), columns=['a','b','c','d'])

wm = lambda x: (x * df.loc[x.index, "c"]).sum() / x.sum()
wm.__name__ = 'wa'

f = lambda x: x.sum() / df['b'] .sum()
f.__name__ = '%'

g = df.groupby('a').agg(
        {'b':['sum', f, 'mean', wm],
         'c':['sum','mean'], 
         'd':['sum']})
g.columns = g.columns.map('_'.join)
print (g)

   d_sum  c_sum    c_mean  b_sum       b_%    b_mean      b_wa
a                                                             
5   2104   2062  5.976812   2067  0.344672  5.991304  5.969521
6   1859   1857  5.951923   1875  0.312656  6.009615  5.954667
7   2058   2084  6.075802   2055  0.342671  5.991254  6.085645

Solution with apply:

def func(x):
#    print (x)
    b1 = x['b'].sum()
    b2 = x['b'].sum() / df['b'].sum()
    b3 = (x['b'] * x['c']).sum() / x['b'].sum()
    b4 = x['b'].mean()

    c1 = x['c'].sum()
    c2 = x['c'].mean()

    d1 = x['d'].sum()
    cols = ['b sum','b %','wa', 'b mean', 'c sum', 'c mean', 'd sum']
    return pd.Series([b1,b2,b3,b4,c1,c2,d1], index=cols)


g = df.groupby('a').apply(func)
print (g)
    b sum       b %        wa    b mean   c sum    c mean   d sum
a                                                                
5  2067.0  0.344672  5.969521  5.991304  2062.0  5.976812  2104.0
6  1875.0  0.312656  5.954667  6.009615  1857.0  5.951923  1859.0
7  2055.0  0.342671  6.085645  5.991254  2084.0  6.075802  2058.0

g.loc['total']=g.sum()
print (g)
        b sum       b %         wa     b mean   c sum     c mean   d sum
a                                                                       
5      2067.0  0.344672   5.969521   5.991304  2062.0   5.976812  2104.0
6      1875.0  0.312656   5.954667   6.009615  1857.0   5.951923  1859.0
7      2055.0  0.342671   6.085645   5.991254  2084.0   6.075802  2058.0
total  5997.0  1.000000  18.009832  17.992173  6003.0  18.004536  6021.0

来源：https://stackoverflow.com/questions/46714555/pandas-and-groupby-how-to-calculate-weighted-averages-within-an-agg

标签

python

pandas

pandas-groupby