Spearman correlation and ties

前端未结

关注

 7  1845

I\'m computing Spearman\'s rho on small sets of paired rankings. Spearman is well known for not handling ties properly. For example, taking 2 sets of 8 rankings, even if 6 a

相关标签:

7条回答

灰色年华

2020-12-04 17:15

The paper "A new rank correlation coefficient with application to the consensus ranking problem" is aimed to solve the ranking with tie problem. It also mentions that Tau-b should not be used as a ranking correlation measure for measuring agreement between weak orderings.

Emond, E. J. and Mason, D. W. (2002), A new rank correlation coefficient with application to the consensus ranking problem. J. Multi‐Crit. Decis. Anal., 11: 17-28. doi:10.1002/mcda.313

0 讨论(0)
发布评论:

提交评论
- 加载中...
天命终不由人

2020-12-04 17:16
I was having a similar problem and by reading the answers here and the help file on R I saw that, when you have ties, you have to add the parameter exact = FALSE) to the cor.test() function. By adding this, it does not try to calculate an exact P value, but instead "the test statistic is the estimate scaled to zero mean and unit variance, and is approximately normally distributed". The result, in my case, was exactly the same, but without the warning about ties.
```
cor.test(x, y, method = "spearm", exact = FALSE)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

一个人的身影

2020-12-04 17:20

I think exact=FALSE does the trick.

cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman", exact=FALSE)

    Spearman's rank correlation rho

data:  c(1, 2, 3, 4, 5, 6, 7, 8) and c(0, 0, 0, 0, 0, 0, 7, 8)
S = 19.8439, p-value = 0.0274
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.7637626

0 讨论(0)

盖世英雄少女心

2020-12-04 17:23

cor.test with method="spearman" actually calculates Spearman coefficient corrected for ties. I've checked it by "manually" calculating tie-corrected and tie-uncorrected Spearman coefficients from equations in Zar 1984, Biostatistical Analysis. Here's the code - just substitute your own variable names to check for yourself:

ym <- data.frame(lousy, dors) ## my data

## ranking variables
ym$l <- rank(ym$lousy)
ym$d <- rank(ym$dors)


## calculating squared differences between ranks
ym$d2d <- (ym$l-ym$d)^2



## calculating variables for equations 19.35 and 19.37 in Zar 1984

lice <- as.data.frame(table(ym$lousy))

lice$t <- lice$Freq^3-lice$Freq

dorsal <- as.data.frame(table(ym$dors))

dorsal$t <- dorsal$Freq^3-dorsal$Freq

n <- nrow(ym)
sum.d2 <- sum(ym$d2d)
Tx <- sum(lice$t)/12
Ty <-sum(dorsal$t)/12


## calculating the coefficients

rs1 <- 1 - (6*sum.d2/(n^3-n))  ## "standard" Spearman cor. coeff. (uncorrected for ties) - eq. 19.35

rs2 <- ((n^3-n)/6 - sum.d2 - Tx - Ty)/sqrt(((n^3-n)/6 - 2*Tx)*((n^3-n)/6 - 2*Ty)) ## Spearman cor.coeff. corrected for ties - eq.19.37


##comparing with cor.test function
cor.test(ym$lousy,ym$dors, method="spearman") ## cor.test gives tie-corrected coefficient!

0 讨论(0)

南方客

2020-12-04 17:25
Well, Kendall tau rank correlation is also a non-parametric test for statistical dependence between two ordinal (or rank-transformed) variables--like Spearman's, but unlike Spearman's, can handle ties.

More specifically, there are three Kendall tau statistics--tau-a, tau-b, and tau-c. tau-b is specifically adapted to handle ties.

The tau-b statistic handles ties (i.e., both members of the pair have the same ordinal value) by a divisor term, which represents the geometric mean between the number of pairs not tied on x and the number not tied on y.

Kendall's tau is not Spearman's--they are not the same, but they are also quite similar. You'll have to decide, based on context, whether the two are similar enough such one can be substituted for the other.

For instance, tau-b:
```
Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5
```
P: number of concordant pairs ('concordant' means the ranks of each member of the pair of data points agree)

Q: number of discordant pairs

X0: number of pairs not tied on x

Y0: number of pairs not tied on y

There is in fact a variant of Spearman's rho that explicitly accounts for ties. In situations in which i needed a non-parametric rank correlation statistic, i have always chosen tau over rho. The reason is that rho sums the squared errors, whereas tau sums the absolute discrepancies. Given that both tau and rho are competent statistics and we are left to choose, a linear penalty on discrepancies (tau) has always seemed to me, a more natural way to express rank correlation. That's not a recommendation, your context might be quite different and dictate otherwise.
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-12-04 17:34
- Ties-corrected Spearman
  
  Using method="spearman" gives you the ties-corrected Spearman. Spearman's rho, according to the definition, is simply the Pearson's sample correlation coefficient computed for ranks of sample data. So it works both in presence and in absence of ties. You can see that after replacing your original data with their ranks (midranks for ties) and using method="pearson", you will get the same result:
```
> cor.test(rank(c(1,2,3,4,5,6,7,8)), rank(c(0,0,0,0,0,0,7,8)), method="pearson")

Pearson's product-moment correlation

data:  rank(c(1, 2, 3, 4, 5, 6, 7, 8)) and rank(c(0, 0, 0, 0, 0, 0, 7, 8))
t = 2.8983, df = 6, p-value = 0.0274
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.1279559 0.9546436
sample estimates:
  cor 
0.7637626 
```
  Notice, there exists a simplified no-ties Spearman version, that is in fact used in cor.test() implementation in absence of ties, but it is equivalent to the definition above.
- P-value
  
  In case of ties in data, exact p-values are not computed neither for Spearman nor for Kendall measures (within cor.test() implementation), hence the warning. As mentioned in Eduardo's post, for not to get a warning you should set exact=FALSE,
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页