Pandas count null values in a groupby function

匿名 (未验证) 提交于 2019-12-03 00:56:02

问题:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],                'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],                'C' : [np.nan, 'bla2', np.nan, 'bla3', np.nan, np.nan, np.nan, np.nan]})

Output:

     A      B     C 0  foo    one   NaN 1  bar    one  bla2 2  foo    two   NaN 3  bar  three  bla3 4  foo    two   NaN 5  bar    two   NaN 6  foo    one   NaN 7  foo  three   NaN

I would like to use groupby in order to count the number of NaN's for the different combinations of foo.

Expected Output (EDIT):

     A      B     C    D 0  foo    one   NaN    2 1  bar    one  bla2    0 2  foo    two   NaN    2 3  bar  three  bla3    0 4  foo    two   NaN    2 5  bar    two   NaN    1 6  foo    one   NaN    2 7  foo  three   NaN    1

Currently I am trying this:

df['count']=df.groupby(['A'])['B'].isnull().transform('sum')

But this is not working...

Thank You

回答1:

I think you need groupby with sum of NaN values:

df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count') print (df2)      A      B  count 0  bar    one      0 1  bar  three      0 2  bar    two      1 3  foo    one      2 4  foo  three      1 5  foo    two      2

If need filter first add boolean indexing:

df = df[df['A'] == 'foo'] df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int) print (df2) A    B     foo  one      2      three    1      two      2

Or simplier:

df = df[df['A'] == 'foo'] df2 = df['B'].value_counts() print (df2) one      2 two      2 three    1 Name: B, dtype: int64

EDIT: Solution is very similar, only add transform:

df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int) print (df)      A      B     C  D 0  foo    one   NaN  2 1  bar    one  bla2  0 2  foo    two   NaN  2 3  bar  three  bla3  0 4  foo    two   NaN  2 5  bar    two   NaN  1 6  foo    one   NaN  2 7  foo  three   NaN  1

Similar solution:

df['D'] = df.C.isnull() df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int) print (df)      A      B     C  D 0  foo    one   NaN  2 1  bar    one  bla2  0 2  foo    two   NaN  2 3  bar  three  bla3  0 4  foo    two   NaN  2 5  bar    two   NaN  1 6  foo    one   NaN  2 7  foo  three   NaN  1


回答2:

df[df.A == 'foo'].groupby('b').agg({'C': lambda x: x.isnull().sum()})

returns:

=>        C B        one    2 three  1 two    2


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!