Remove outliers from correlation coefficient calculation

前端未结

关注

 5  2002

有刺的猬 2021-01-31 22:57

Assume we have two numeric vectors x and y. The Pearson correlation coefficient between x and y is given by

5条回答

暖寄归人 (楼主)

2021-01-31 23:33

Here's another possibility with the outliers captured. Using a similar scheme as Prasad:

library(mvoutlier)    
set.seed(1)    
x <- rnorm(1000)    
y <- rnorm(1000)    
xy <- cbind(x, y)    
outliers <- aq.plot(xy, alpha=0.975) #The documentation/default says alpha=0.025.  I think the functions wants 0.975   
cor.plot(x, y)    
color.plot(xy)   
dd.plot(xy)   
uni.plot(xy)

In the other answers, 500 was stuck on the end of x and y as an outlier. That may, or may not cause a memory problem with your machine, so I dropped it down to 4 to avoid that.

x1 <- c(x, 4)     
y1 <- c(y, 4)    
xy1 <- cbind(x1, y1)    
outliers1 <- aq.plot(xy1, alpha=0.975) #The documentation/default says alpha=0.025.  I think the functions wants 0.975
cor.plot(x1, y1)    
color.plot(xy1)    
dd.plot(xy1)    
uni.plot(xy1)

Here are the images from the x1, y1, xy1 data:

alt text

0 讨论(0)

查看其它5个回答