df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : [np.nan, 'bla2', np.nan, 'bla3', np.nan, np.nan, np.nan, np.nan]})
Output:
A B C 0 foo one NaN 1 bar one bla2 2 foo two NaN 3 bar three bla3 4 foo two NaN 5 bar two NaN 6 foo one NaN 7 foo three NaN
I would like to use groupby in order to count the number of NaN's for the different combinations of foo.
Expected Output (EDIT):
A B C D 0 foo one NaN 2 1 bar one bla2 0 2 foo two NaN 2 3 bar three bla3 0 4 foo two NaN 2 5 bar two NaN 1 6 foo one NaN 2 7 foo three NaN 1
Currently I am trying this:
df['count']=df.groupby(['A'])['B'].isnull().transform('sum')
But this is not working...
Thank You
I think you need groupby
with sum
of NaN
values:
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count') print (df2) A B count 0 bar one 0 1 bar three 0 2 bar two 1 3 foo one 2 4 foo three 1 5 foo two 2
If need filter first add boolean indexing
:
df = df[df['A'] == 'foo'] df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int) print (df2) A B foo one 2 three 1 two 2
Or simplier:
df = df[df['A'] == 'foo'] df2 = df['B'].value_counts() print (df2) one 2 two 2 three 1 Name: B, dtype: int64
EDIT: Solution is very similar, only add transform
:
df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int) print (df) A B C D 0 foo one NaN 2 1 bar one bla2 0 2 foo two NaN 2 3 bar three bla3 0 4 foo two NaN 2 5 bar two NaN 1 6 foo one NaN 2 7 foo three NaN 1
Similar solution:
df['D'] = df.C.isnull() df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int) print (df) A B C D 0 foo one NaN 2 1 bar one bla2 0 2 foo two NaN 2 3 bar three bla3 0 4 foo two NaN 2 5 bar two NaN 1 6 foo one NaN 2 7 foo three NaN 1
df[df.A == 'foo'].groupby('b').agg({'C': lambda x: x.isnull().sum()})
returns:
=> C B one 2 three 1 two 2