given a dataframe that logs uses of some books like this:
Name Type ID
Book1 ebook 1
Book2 paper 2
Book3 paper 3
Book1 ebook 1
Book2 paper 2
I think as_index=False should do the trick.
df.groupby(['Name','Type','ID'], as_index=False).count()
You want the following:
In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()
Out[20]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.
An alternative approach would be to add the 'Count' column using transform
and then call drop_duplicates
:
In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()
Out[25]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...)
, see here. The .agg()
function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}
. Instead of 'first'
, you can also apply 'sum'
, 'mean'
and others.