Python-pandas Replace NA with the median or mean of a group in dataframe

前端 未结 2 1132
甜味超标
甜味超标 2021-01-02 10:38

Suppose we have a df:

    A       B
   apple   1.0
   apple   2.0
   apple    NA
   orange   NA
   orange  7.0
   melon   14.0
   melon   NA
   melon   15.0
         


        
2条回答
  •  误落风尘
    2021-01-02 11:02

    In R, can use na.aggregate/data.table to replace the NA by mean value of the group. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'A', apply the na.aggregate on 'B'.

    library(zoo)
    library(data.table)
    setDT(df)[,  B:= na.aggregate(B), A]
    df
    #      A    B
    #1:  apple  1.0
    #2:  apple  2.0
    #3:  apple  1.5
    #4: orange  7.0
    #5: orange  7.0
    #6:  melon 14.0
    #7:  melon 15.0
    #8:  melon 15.0
    #9:  melon 16.0
    

提交回复
热议问题