Pandas: filling missing values by mean in each group

前端 未结 9 1086
耶瑟儿~
耶瑟儿~ 2020-11-22 06:06

This should be straightforward, but the closest thing I\'ve found is this post: pandas: Filling missing values within a group, and I still can\'t solve my problem....

<
9条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-22 06:51

    Most of above answers involved using "groupby" and "transform" to fill the missing values.

    But i prefer using "groupby" with "apply" to fill the missing values which is more intuitive to me.

    >>> df['value']=df.groupby('name')['value'].apply(lambda x:x.fillna(x.mean()))
    >>> df.isnull().sum().sum()
        0 
    

    Shortcut: Groupby + Apply/Lambda + Fillna + Mean

    This solution still works if you want to group by multiple columns to replace missing values.

         >>> df = pd.DataFrame({'value': [1, np.nan, np.nan, 2, 3, np.nan,np.nan, 4, 3], 
        'name': ['A','A', 'B','B','B','B', 'C','C','C'],'class':list('ppqqrrsss')})  
    
         >>> df
       value name   class
    0    1.0    A     p
    1    NaN    A     p
    2    NaN    B     q
    3    2.0    B     q
    4    3.0    B     r
    5    NaN    B     r
    6    NaN    C     s
    7    4.0    C     s
    8    3.0    C     s
    
    >>> df['value']=df.groupby(['name','class'])['value'].apply(lambda x:x.fillna(x.mean()))
    
    >>> df
            value name   class
        0    1.0    A     p
        1    1.0    A     p
        2    2.0    B     q
        3    2.0    B     q
        4    3.0    B     r
        5    3.0    B     r
        6    3.5    C     s
        7    4.0    C     s
        8    3.0    C     s
    

提交回复
热议问题