Replace NA with mode based on ID attribute

限于喜欢 提交于 2020-01-30 06:06:59

问题


I have a dataset dt and I want to replace the NA values with the mode of each attribute based on the id as follow:

Before:

 id  att  
  1  v
  1  v
  1  NA
  1  c
  2  c
  2  v
  2  NA
  2  c

The outcome I am looking for is:

 id  att
  1  v
  1  v
  1  v
  1  c
  2  c
  2  v
  2  c
  2  c

I have done some attempts for example I found another similar question which wanted to replace the NA with mean (which has a built in function), therefore I tried to adjust the code as follow:

for (i in 1:dim(dt)[1]) {
    if (is.na(dt$att[i])) {
      att_mode <-                  # I am stuck here to return the mode of an attribute based on ID
      dt$att[i] <- att_mode 
    }
  }

I found the following function to calculate the mode

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

from the following link: Is there a built-in function for finding the mode?

But I have no idea how to apply it inside the for loop, I tried apply, ave functions but they do not seem to be the right choice because of the different dimensions.

Could anyone help on how to return the mode in my for loop?

Thank you


回答1:


We can use na.aggrgate from library(zoo), specify the FUN as Mode. If this is a group by operation, we can do this using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', we apply the na.aggregate

library(data.table)
library(zoo)
setDT(df1)[, att:= na.aggregate(att, FUN=Mode), by = id]
df1
#    id att
#1:  1   v
#2:  1   v
#3:  1   v
#4:  1   c
#5:  2   c
#6:  2   v
#7:  2   c
#8:  2   c

A similar option with dplyr

library(dplyr)
df1 %>%
     group_by(id) %>%
     mutate(att = na.aggregate(att, FUN=Mode))

NOTE: Mode from OP's post. Also, assuming that the 'att' is character class.



来源:https://stackoverflow.com/questions/35312390/replace-na-with-mode-based-on-id-attribute

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!