Python-pandas Replace NA with the median or mean of a group in dataframe

前端 未结 2 1119
甜味超标
甜味超标 2021-01-02 10:38

Suppose we have a df:

    A       B
   apple   1.0
   apple   2.0
   apple    NA
   orange   NA
   orange  7.0
   melon   14.0
   melon   NA
   melon   15.0
         


        
相关标签:
2条回答
  • 2021-01-02 11:02

    In R, can use na.aggregate/data.table to replace the NA by mean value of the group. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'A', apply the na.aggregate on 'B'.

    library(zoo)
    library(data.table)
    setDT(df)[,  B:= na.aggregate(B), A]
    df
    #      A    B
    #1:  apple  1.0
    #2:  apple  2.0
    #3:  apple  1.5
    #4: orange  7.0
    #5: orange  7.0
    #6:  melon 14.0
    #7:  melon 15.0
    #8:  melon 15.0
    #9:  melon 16.0
    
    0 讨论(0)
  • 2021-01-02 11:24

    In pandas you may use transform to obtain null-fill values:

    >>> med = df.groupby('A')['B'].transform('median')
    >>> df['B'].fillna(med)
    0     1.0
    1     2.0
    2     1.5
    3     7.0
    4     7.0
    5    14.0
    6    15.0
    7    15.0
    8    16.0
    Name: B, dtype: float64
    
    0 讨论(0)
提交回复
热议问题