问题
Hello I'm having troubles dealing with Pandas. I'm trying to sum duplicated rows on a multiindex Dataframe.
I tryed with df.groupby(level=[0,1]).sum()
, also with df.stack().reset_index().groupby(['year', 'product']).sum()
and some others, but I cannot get it to work.
I'd also like to add every unique product for each given year and give them a 0 value if they weren't listed.
Example: dataframe with multi-index and 3 different products (A,B,C):
volume1 volume2
year product
2010 A 10 12
A 7 3
B 7 7
2011 A 10 10
B 7 6
C 5 5
Expected output : if there are duplicated products for a given year then we sum them. If one of the products isnt listed for a year, we create a new row full of 0.
volume1 volume2
year product
2010 A 17 15
B 7 7
C 0 0
2011 A 10 10
B 7 6
C 5 5
Any idea ? Thanks
回答1:
Use sum
with unstack and stack:
df = df.sum(level=[0,1]).unstack(fill_value=0).stack()
#same as
#df = df.groupby(level=[0,1]).sum().unstack(fill_value=0).stack()
Alternative with reindex:
df = df.sum(level=[0,1])
#same as
#df = df.groupby(level=[0,1]).sum()
mux = pd.MultiIndex.from_product(df.index.levels, names = df.index.names)
df = df.reindex(mux, fill_value=0)
Alternative1, thanks @Wen:
df = df.sum(level=[0,1]).unstack().stack(dropna=False)
print (df)
volume1 volume2
year product
2010 A 17 15
B 7 7
C 0 0
2011 A 10 10
B 7 6
C 5 5
回答2:
You can make the second level of the index a CategoricalIndex
and when you use groupby
it will include all of the categories.
df.index.set_levels(pd.CategoricalIndex(df.index.levels[1]), 1, inplace=True)
df.groupby(level=[0, 1]).sum().fillna(0, downcast='infer')
volume1 volume2
year product
2010 A 17 15
B 7 7
C 0 0
2011 A 10 10
B 7 6
C 5 5
来源:https://stackoverflow.com/questions/48830485/sum-duplicated-rows-on-a-multi-index-pandas-dataframe