Remove outliers from correlation coefficient calculation

前端 未结 5 2002
有刺的猬
有刺的猬 2021-01-31 22:57

Assume we have two numeric vectors x and y. The Pearson correlation coefficient between x and y is given by

5条回答
  •  暖寄归人
    2021-01-31 23:33

    Here's another possibility with the outliers captured. Using a similar scheme as Prasad:

    library(mvoutlier)    
    set.seed(1)    
    x <- rnorm(1000)    
    y <- rnorm(1000)    
    xy <- cbind(x, y)    
    outliers <- aq.plot(xy, alpha=0.975) #The documentation/default says alpha=0.025.  I think the functions wants 0.975   
    cor.plot(x, y)    
    color.plot(xy)   
    dd.plot(xy)   
    uni.plot(xy)    
    

    In the other answers, 500 was stuck on the end of x and y as an outlier. That may, or may not cause a memory problem with your machine, so I dropped it down to 4 to avoid that.

    x1 <- c(x, 4)     
    y1 <- c(y, 4)    
    xy1 <- cbind(x1, y1)    
    outliers1 <- aq.plot(xy1, alpha=0.975) #The documentation/default says alpha=0.025.  I think the functions wants 0.975
    cor.plot(x1, y1)    
    color.plot(xy1)    
    dd.plot(xy1)    
    uni.plot(xy1)    
    

    Here are the images from the x1, y1, xy1 data:

    alt text

    alt text

    alt text

提交回复
热议问题