Pandas: filling missing values by mean in each group

前端未结

关注

 9  1178

耶瑟儿～ 2020-11-22 06:06

This should be straightforward, but the closest thing I\'ve found is this post: pandas: Filling missing values within a group, and I still can\'t solve my problem....

9条回答

挽巷 (楼主)

2020-11-22 06:51

Most of above answers involved using "groupby" and "transform" to fill the missing values.

But i prefer using "groupby" with "apply" to fill the missing values which is more intuitive to me.

>>> df['value']=df.groupby('name')['value'].apply(lambda x:x.fillna(x.mean()))
>>> df.isnull().sum().sum()
    0

Shortcut: Groupby + Apply/Lambda + Fillna + Mean

This solution still works if you want to group by multiple columns to replace missing values.

     >>> df = pd.DataFrame({'value': [1, np.nan, np.nan, 2, 3, np.nan,np.nan, 4, 3], 
    'name': ['A','A', 'B','B','B','B', 'C','C','C'],'class':list('ppqqrrsss')})  

     >>> df
   value name   class
0    1.0    A     p
1    NaN    A     p
2    NaN    B     q
3    2.0    B     q
4    3.0    B     r
5    NaN    B     r
6    NaN    C     s
7    4.0    C     s
8    3.0    C     s

>>> df['value']=df.groupby(['name','class'])['value'].apply(lambda x:x.fillna(x.mean()))

>>> df
        value name   class
    0    1.0    A     p
    1    1.0    A     p
    2    2.0    B     q
    3    2.0    B     q
    4    3.0    B     r
    5    3.0    B     r
    6    3.5    C     s
    7    4.0    C     s
    8    3.0    C     s

0 讨论(0)

查看其它9个回答