This should be an easy one, but somehow I couldn\'t find a solution that works.
I have a pandas dataframe which looks like this:
index col1 col2
Another generic solution is
df.groupby(['col1','col2']).agg({'col3':'sum','col4':'sum'}).reset_index()
This will give you the required output.
UPDATED (June 2020): Introduced in Pandas 0.25.0, Pandas has added new groupby behavior “named aggregation” and tuples, for naming the output columns when applying multiple aggregation functions to specific columns.
df.groupby(
['col1','col2']
).agg(
sum_col3 = ('col3','sum'),
sum_col4 = ('col4','sum'),
).reset_index()
Refer to Link for detailed description.
The issue is likely that df.col3.dtype
is likely not an int
or a numeric datatype. Try df.col3 = df.col3.astype(int)
before doing your groupby
Additionally, select your columns after the groupby to see if the columns are even being aggregated:
df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]
By using apply
df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4
If you want to agg
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})
I think it would be more efficient to do the following:
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'}).sum(axis=1)
or:
df.groupby(['col1', 'col2'])['col3', 'col4'].sum().sum(axis=1)
This does assume you have appropriate types in the dataframe.
The above answer didn't work for me.
df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]
I was grouping by single group by and sum columns.
Here is the one worked for me.
D1.groupby(['col1'])['col2'].sum() << The sum at the end not the middle.