Remove outliers from correlation coefficient calculation

前端 未结 5 1981
有刺的猬
有刺的猬 2021-01-31 22:57

Assume we have two numeric vectors x and y. The Pearson correlation coefficient between x and y is given by

5条回答
  •  青春惊慌失措
    2021-01-31 23:13

    You might try bootstrapping your data to find the highest correlation coefficient, e.g.:

    x <- cars$dist
    y <- cars$speed
    percent <- 0.9         # given in the question above
    n <- 1000              # number of resampling
    boot.cor <- replicate(n, {tmp <- sample(round(length(x)*percent), replace=FALSE); cor(x[tmp], y[tmp])})
    

    And after run max(boot.cor). Do not be dissapointed if all the correlation coefficients will be all the same :)

提交回复
热议问题