Sum duplicated rows on a multi-index pandas dataframe

帅比萌擦擦* 提交于 2019-12-11 00:22:32

问题


Hello I'm having troubles dealing with Pandas. I'm trying to sum duplicated rows on a multiindex Dataframe. I tryed with df.groupby(level=[0,1]).sum() , also with df.stack().reset_index().groupby(['year', 'product']).sum() and some others, but I cannot get it to work. I'd also like to add every unique product for each given year and give them a 0 value if they weren't listed.

Example: dataframe with multi-index and 3 different products (A,B,C):

                  volume1    volume2
year   product
2010   A          10         12
       A          7          3
       B          7          7
2011   A          10         10
       B          7          6
       C          5          5

Expected output : if there are duplicated products for a given year then we sum them. If one of the products isnt listed for a year, we create a new row full of 0.

                  volume1     volume2
year   product
2010   A          17          15
       B          7           7
       C          0           0
2011   A          10          10
       B          7           6
       C          5           5

Any idea ? Thanks


回答1:


Use sum with unstack and stack:

df = df.sum(level=[0,1]).unstack(fill_value=0).stack()
#same as
#df = df.groupby(level=[0,1]).sum().unstack(fill_value=0).stack()

Alternative with reindex:

df = df.sum(level=[0,1])
#same as
#df = df.groupby(level=[0,1]).sum()
mux = pd.MultiIndex.from_product(df.index.levels, names = df.index.names)
df = df.reindex(mux, fill_value=0)

Alternative1, thanks @Wen:

df = df.sum(level=[0,1]).unstack().stack(dropna=False) 

print (df)
              volume1  volume2
year product                  
2010 A             17       15
     B              7        7
     C              0        0
2011 A             10       10
     B              7        6
     C              5        5



回答2:


You can make the second level of the index a CategoricalIndex and when you use groupby it will include all of the categories.

df.index.set_levels(pd.CategoricalIndex(df.index.levels[1]), 1, inplace=True)
df.groupby(level=[0, 1]).sum().fillna(0, downcast='infer')

              volume1  volume2
year product                  
2010 A             17       15
     B              7        7
     C              0        0
2011 A             10       10
     B              7        6
     C              5        5


来源:https://stackoverflow.com/questions/48830485/sum-duplicated-rows-on-a-multi-index-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!