Spearman correlation and ties

 ̄綄美尐妖づ 提交于 2019-11-28 04:33:08

Well, Kendall tau rank correlation is also a non-parametric test for statistical dependence between two ordinal (or rank-transformed) variables--like Spearman's, but unlike Spearman's, can handle ties.

More specifically, there are three Kendall tau statistics--tau-a, tau-b, and tau-c. tau-b is specifically adapted to handle ties.

The tau-b statistic handles ties (i.e., both members of the pair have the same ordinal value) by a divisor term, which represents the geometric mean between the number of pairs not tied on x and the number not tied on y.

Kendall's tau is not Spearman's--they are not the same, but they are also quite similar. You'll have to decide, based on context, whether the two are similar enough such one can be substituted for the other.

For instance, tau-b:

Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5

P: number of concordant pairs ('concordant' means the ranks of each member of the pair of data points agree)

Q: number of discordant pairs

X0: number of pairs not tied on x

Y0: number of pairs not tied on y

There is in fact a variant of Spearman's rho that explicitly accounts for ties. In situations in which i needed a non-parametric rank correlation statistic, i have always chosen tau over rho. The reason is that rho sums the squared errors, whereas tau sums the absolute discrepancies. Given that both tau and rho are competent statistics and we are left to choose, a linear penalty on discrepancies (tau) has always seemed to me, a more natural way to express rank correlation. That's not a recommendation, your context might be quite different and dictate otherwise.

Eduardo Maçan

I think exact=FALSE does the trick.

cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman", exact=FALSE)

    Spearman's rank correlation rho

data:  c(1, 2, 3, 4, 5, 6, 7, 8) and c(0, 0, 0, 0, 0, 0, 7, 8)
S = 19.8439, p-value = 0.0274
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.7637626 

cor.test with method="spearman" actually calculates Spearman coefficient corrected for ties. I've checked it by "manually" calculating tie-corrected and tie-uncorrected Spearman coefficients from equations in Zar 1984, Biostatistical Analysis. Here's the code - just substitute your own variable names to check for yourself:

ym <- data.frame(lousy, dors) ## my data

## ranking variables
ym$l <- rank(ym$lousy)
ym$d <- rank(ym$dors)


## calculating squared differences between ranks
ym$d2d <- (ym$l-ym$d)^2



## calculating variables for equations 19.35 and 19.37 in Zar 1984

lice <- as.data.frame(table(ym$lousy))

lice$t <- lice$Freq^3-lice$Freq

dorsal <- as.data.frame(table(ym$dors))

dorsal$t <- dorsal$Freq^3-dorsal$Freq

n <- nrow(ym)
sum.d2 <- sum(ym$d2d)
Tx <- sum(lice$t)/12
Ty <-sum(dorsal$t)/12


## calculating the coefficients

rs1 <- 1 - (6*sum.d2/(n^3-n))  ## "standard" Spearman cor. coeff. (uncorrected for ties) - eq. 19.35

rs2 <- ((n^3-n)/6 - sum.d2 - Tx - Ty)/sqrt(((n^3-n)/6 - 2*Tx)*((n^3-n)/6 - 2*Ty)) ## Spearman cor.coeff. corrected for ties - eq.19.37


##comparing with cor.test function
cor.test(ym$lousy,ym$dors, method="spearman") ## cor.test gives tie-corrected coefficient!
  • Ties-corrected Spearman

    Using method="spearman" gives you the ties-corrected Spearman. Spearman's rho, according to the definition, is simply the Pearson's sample correlation coefficient computed for ranks of sample data. So it works both in presence and in absence of ties. You can see that after replacing your original data with their ranks (midranks for ties) and using method="pearson", you will get the same result:

    > cor.test(rank(c(1,2,3,4,5,6,7,8)), rank(c(0,0,0,0,0,0,7,8)), method="pearson")
    
    Pearson's product-moment correlation
    
    data:  rank(c(1, 2, 3, 4, 5, 6, 7, 8)) and rank(c(0, 0, 0, 0, 0, 0, 7, 8))
    t = 2.8983, df = 6, p-value = 0.0274
    alternative hypothesis: true correlation is not equal to 0
    95 percent confidence interval:
     0.1279559 0.9546436
    sample estimates:
      cor 
    0.7637626 
    

    Notice, there exists a simplified no-ties Spearman version, that is in fact used in cor.test() implementation in absence of ties, but it is equivalent to the definition above.

  • P-value

    In case of ties in data, exact p-values are not computed neither for Spearman nor for Kendall measures (within cor.test() implementation), hence the warning. As mentioned in Eduardo's post, for not to get a warning you should set exact=FALSE,

Moniba

The paper "A new rank correlation coefficient with application to the consensus ranking problem" is aimed to solve the ranking with tie problem. It also mentions that Tau-b should not be used as a ranking correlation measure for measuring agreement between weak orderings.

Emond, E. J. and Mason, D. W. (2002), A new rank correlation coefficient with application to the consensus ranking problem. J. Multi‐Crit. Decis. Anal., 11: 17-28. doi:10.1002/mcda.313

liana_oliveira

I was having a similar problem and by reading the answers here and the help file on R I saw that, when you have ties, you have to add the parameter exact = FALSE) to the cor.test() function. By adding this, it does not try to calculate an exact P value, but instead "the test statistic is the estimate scaled to zero mean and unit variance, and is approximately normally distributed". The result, in my case, was exactly the same, but without the warning about ties.

cor.test(x, y, method = "spearm", exact = FALSE)

The R package ConsRank contains an implementation of Edmon and Mason's Tau_X. This appears to be the (mathematically) best currently known method for handling ties.

See the docs, which give the usage as

Tau_X(X, Y=NULL)

where X can be a matrix.

As pointed out by @wibeasley, Emond and Mason (2002) proposed Tau_X, a new rank correlation coefficient which appears to be superior to Kendal's Tau-b. NelsonGon was concerned that the paper is from 2002, predating the question by a few years, but seems to have overlooked that Spearman's correlation dates from 1904, and Kendall's Tau from 1938.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!