Pandas groupby sum

落花浮王杯 提交于 2019-12-12 04:12:33

问题


I have a dataframe as follows:

ref, type, amount
001, foo, 10
001, foo, 5
001, bar, 50
001, bar, 5
001, test, 100
001, test, 90
002, foo, 20
002, foo, 35
002, bar, 75
002, bar, 80
002, test, 150
002, test, 110

This is what I'm trying to get:

ref, type, amount, foo, bar, test
001, foo, 10, 15, 55, 190
001, foo, 5, 15, 55, 190
001, bar, 50, 15, 55, 190
001, bar, 5, 15, 55, 190
001, test, 100, 15, 55, 190
001, test, 90, 15, 55, 190
002, foo, 20, 55, 155, 260
002, foo, 35, 55, 155, 260
002, bar, 75, 55, 155, 260
002, bar, 80, 55, 155, 260
002, test, 150, 55, 155, 260
002, test, 110, 55, 155, 260

So I have this:

df.groupby('ref')['amount'].transform(sum)

But how can I filter it such that the above only applies to rows where type=foo or bar or test?


回答1:


A solution using pivot table :

>>> b = pd.pivot_table(df, values='amount', index=['ref'], columns=['type'], aggfunc=np.sum)
>>> b
type  bar  foo  test
ref
1      55   15   190
2     155   55   260

>>> pd.merge(df, b, left_on='ref', right_index=True)
    ref  type  amount  bar  foo  test
0     1   foo      10   55   15   190
1     1   foo       5   55   15   190
2     1   bar      50   55   15   190
3     1   bar       5   55   15   190
4     1  test     100   55   15   190
5     1  test      90   55   15   190
6     2   foo      20  155   55   260
7     2   foo      35  155   55   260
8     2   bar      75  155   55   260
9     2   bar      80  155   55   260
10    2  test     150  155   55   260
11    2  test     110  155   55   260



回答2:


I think you need groupby with unstack and then merge to original DataFrame:

df1 = df.groupby(['ref','type'])['amount'].sum().unstack().reset_index()
print (df1)
type  ref  bar  foo  test
0     001   55   15   190
1     002  155   55   260

df = pd.merge(df, df1, on='ref')
print (df)
    ref  type  amount  sums  bar  foo  test
0   001   foo      10    15   55   15   190
1   001   foo       5    15   55   15   190
2   001   bar      50    55   55   15   190
3   001   bar       5    55   55   15   190
4   001  test     100   190   55   15   190
5   001  test      90   190   55   15   190
6   002   foo      20    55  155   55   260
7   002   foo      35    55  155   55   260
8   002   bar      75   155  155   55   260
9   002   bar      80   155  155   55   260
10  002  test     150   260  155   55   260
11  002  test     110   260  155   55   260

Timings:

In [506]: %timeit (pd.merge(df, df.groupby(['ref','type'])['amount'].sum().unstack().reset_index(), on='ref'))
100 loops, best of 3: 3.4 ms per loop

In [507]: %timeit (pd.merge(df, pd.pivot_table(df, values='amount', index=['ref'], columns=['type'], aggfunc=np.sum), left_on='ref', right_index=True))
100 loops, best of 3: 4.99 ms per loop


来源:https://stackoverflow.com/questions/39750590/pandas-groupby-sum

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!