I have a dataset will some missing data that looks like this:
id category value
1 A NaN
2 B NaN
3 A 10.5
I think you can use groupby and apply
fillna with mean. Then get NaN
if some category has only NaN
values, so use mean of all values of column for filling NaN
:
df.value = df.groupby('category')['value'].apply(lambda x: x.fillna(x.mean()))
df.value = df.value.fillna(df.value.mean())
print (df)
id category value
0 1 A 6.25
1 2 B 1.00
2 3 A 10.50
3 4 C 4.15
4 5 A 2.00
5 6 B 1.00
You can also use GroupBy
+ transform
to fill NaN
values with groupwise means. This method avoids inefficient apply
+ lambda
. For example:
df['value'] = df['value'].fillna(df.groupby('category')['value'].transform('mean'))
df['value'] = df['value'].fillna(df['value'].mean())