Suppose we have a df:
A B
apple 1.0
apple 2.0
apple NA
orange NA
orange 7.0
melon 14.0
melon NA
melon 15.0
In R
, can use na.aggregate/data.table
to replace the NA
by mean
value of the group. We convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'A', apply the na.aggregate
on 'B'.
library(zoo)
library(data.table)
setDT(df)[, B:= na.aggregate(B), A]
df
# A B
#1: apple 1.0
#2: apple 2.0
#3: apple 1.5
#4: orange 7.0
#5: orange 7.0
#6: melon 14.0
#7: melon 15.0
#8: melon 15.0
#9: melon 16.0
In pandas you may use transform to obtain null-fill values:
>>> med = df.groupby('A')['B'].transform('median')
>>> df['B'].fillna(med)
0 1.0
1 2.0
2 1.5
3 7.0
4 7.0
5 14.0
6 15.0
7 15.0
8 16.0
Name: B, dtype: float64