R: outlier cleaning for each column in a dataframe by using quantiles 0.05 and 0.95

限于喜欢 提交于 2019-12-04 14:07:35

Please don't do this. This is not a good strategy for dealing with outliers - particularly since it's unlikely that 10% of your data are outliers!

I can't think of a function in R that does this, but you can define a small one yourself:

foo <- function(x)
{
    quant <- quantile(x,c(0.05,0.95))
    x[x < quant[1]] <- min(x[x >= quant[1]])
    x[x > quant[2]] <- max(x[x <= quant[2]])
    return(round((x - min(x))/abs(max(x) - min(x)),1))
}

Then sapply this to each variable in your dataframe:

sapply(c,foo)
       a   b
 [1,] 1.0 1.0
 [2,] 0.7 0.7
 [3,] 0.3 0.3
 [4,] 0.7 0.7
 [5,] 0.3 0.3
 [6,] 0.0 0.0
 [7,] 0.3 0.3
 [8,] 0.7 0.7
 [9,] 1.0 1.0
[10,] 0.7 0.7
[11,] 0.0 0.0
[12,] 1.0 1.0
[13,] 0.3 0.3
[14,] 0.7 0.7
[15,] 0.3 0.3
[16,] 1.0 1.0
[17,] 0.0 0.0

Edit: This answer was meant to solve the programming problem. In regard to actually using it I fully agree with Hadley

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!