问题
When i selected NA value here
data[data=="na"] <- NA
data[!complete.cases(data),]
i must replace it, but depending on type of distribution. If using Shapiro.test the distribution by variables not normal, then missing value must be replace by median, If it's normal, than replace by mean. But distribution for each gender(1 girl, 2 -man)
data=structure(list(sex = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), emotion = c(20L,
15L, 49L, NA, 34L, 35L, 54L, 45L), IQ = c(101L, 98L, 105L, NA,
123L, 120L, 115L, NA)), .Names = c("sex", "emotion", "IQ"), class = "data.frame", row.names = c(NA,
-8L))
the desired output
sex emotion IQ
1 20 101
1 15 98
1 49 105
1 28 101
2 34 123
2 35 120
2 54 115
2 45 119
回答1:
Following code will replace NA values according to the Shapiro Test:
library(dplyr)
data %>%
group_by(sex) %>%
mutate(
emotion = ifelse(!is.na(emotion), emotion,
ifelse(shapiro.test(emotion)$p.value > 0.05,
mean(emotion, na.rm=TRUE), quantile(emotion, na.rm=TRUE, probs=0.5) ) ),
IQ = ifelse(!is.na(IQ), IQ,
ifelse(shapiro.test(IQ)$p.value > 0.05,
mean(IQ, na.rm=TRUE), quantile(IQ, na.rm=TRUE, probs=0.5) )
)
)
来源:https://stackoverflow.com/questions/51326684/replacing-na-depending-on-distribution-type-of-gender-in-r