python pandas error when doing groupby counts

前端未结

关注

 1  832

When doing groupby counts over multiple columns I get an error. Here is my dataframe and also an example that simply labels the distinct \'b\' and \'c\' groups.

相关标签:

1条回答

无人共我

2020-12-16 03:19

Evaluate df.groupby(['b', 'c']).count() in an interactive session:

In [150]: df.groupby(['b', 'c']).count()
Out[150]: 
     a  b  c  d
b c            
0 0  1  1  1  1
  1  1  1  1  1
1 1  2  2  2  2

This is a whole DataFrame. It is probably not what you want to assign to a new column of df (in fact, you can not assign a column to a DataFrame, which is why an albeit cryptic exception is raised.).

If you wish to create a new column which counts the number of rows in each group, you could use

df['gr'] = df.groupby(['b', 'c'])['a'].transform('count')

For example,

import pandas as pd
import numpy as np
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0, 2, (4, 4)),
                  columns=['a', 'b', 'c', 'd'])
print(df)
#    a  b  c  d
# 0  1  1  0  0
# 1  1  1  1  1
# 2  1  0  0  1
# 3  0  1  1  0

df['gr'] = df.groupby(['b', 'c'])['a'].transform('count')

df['comp_ids'] = df.groupby(['b', 'c']).grouper.group_info[0]
print(df)

yields

   a  b  c  d  gr  comp_ids
0  1  1  0  0   1         1
1  1  1  1  1   2         2
2  1  0  0  1   1         0
3  0  1  1  0   2         2

Notice that df.groupby(['b', 'c']).grouper.group_info[0] is returning something other than the counts of the number of rows in each group. Rather, it is returning a label for each group.

0 讨论(0)