Spearman correlation and ties

前端 未结 7 1806
南旧
南旧 2020-12-04 16:45

I\'m computing Spearman\'s rho on small sets of paired rankings. Spearman is well known for not handling ties properly. For example, taking 2 sets of 8 rankings, even if 6 a

相关标签:
7条回答
  • 2020-12-04 17:15

    The paper "A new rank correlation coefficient with application to the consensus ranking problem" is aimed to solve the ranking with tie problem. It also mentions that Tau-b should not be used as a ranking correlation measure for measuring agreement between weak orderings.

    Emond, E. J. and Mason, D. W. (2002), A new rank correlation coefficient with application to the consensus ranking problem. J. Multi‐Crit. Decis. Anal., 11: 17-28. doi:10.1002/mcda.313

    0 讨论(0)
  • 2020-12-04 17:16

    I was having a similar problem and by reading the answers here and the help file on R I saw that, when you have ties, you have to add the parameter exact = FALSE) to the cor.test() function. By adding this, it does not try to calculate an exact P value, but instead "the test statistic is the estimate scaled to zero mean and unit variance, and is approximately normally distributed". The result, in my case, was exactly the same, but without the warning about ties.

    cor.test(x, y, method = "spearm", exact = FALSE)
    
    0 讨论(0)
  • 2020-12-04 17:20

    I think exact=FALSE does the trick.

    cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman", exact=FALSE)
    
        Spearman's rank correlation rho
    
    data:  c(1, 2, 3, 4, 5, 6, 7, 8) and c(0, 0, 0, 0, 0, 0, 7, 8)
    S = 19.8439, p-value = 0.0274
    alternative hypothesis: true rho is not equal to 0
    sample estimates:
          rho 
    0.7637626 
    
    0 讨论(0)
  • 2020-12-04 17:23

    cor.test with method="spearman" actually calculates Spearman coefficient corrected for ties. I've checked it by "manually" calculating tie-corrected and tie-uncorrected Spearman coefficients from equations in Zar 1984, Biostatistical Analysis. Here's the code - just substitute your own variable names to check for yourself:

    ym <- data.frame(lousy, dors) ## my data
    
    ## ranking variables
    ym$l <- rank(ym$lousy)
    ym$d <- rank(ym$dors)
    
    
    ## calculating squared differences between ranks
    ym$d2d <- (ym$l-ym$d)^2
    
    
    
    ## calculating variables for equations 19.35 and 19.37 in Zar 1984
    
    lice <- as.data.frame(table(ym$lousy))
    
    lice$t <- lice$Freq^3-lice$Freq
    
    dorsal <- as.data.frame(table(ym$dors))
    
    dorsal$t <- dorsal$Freq^3-dorsal$Freq
    
    n <- nrow(ym)
    sum.d2 <- sum(ym$d2d)
    Tx <- sum(lice$t)/12
    Ty <-sum(dorsal$t)/12
    
    
    ## calculating the coefficients
    
    rs1 <- 1 - (6*sum.d2/(n^3-n))  ## "standard" Spearman cor. coeff. (uncorrected for ties) - eq. 19.35
    
    rs2 <- ((n^3-n)/6 - sum.d2 - Tx - Ty)/sqrt(((n^3-n)/6 - 2*Tx)*((n^3-n)/6 - 2*Ty)) ## Spearman cor.coeff. corrected for ties - eq.19.37
    
    
    ##comparing with cor.test function
    cor.test(ym$lousy,ym$dors, method="spearman") ## cor.test gives tie-corrected coefficient!
    
    0 讨论(0)
  • 2020-12-04 17:25

    Well, Kendall tau rank correlation is also a non-parametric test for statistical dependence between two ordinal (or rank-transformed) variables--like Spearman's, but unlike Spearman's, can handle ties.

    More specifically, there are three Kendall tau statistics--tau-a, tau-b, and tau-c. tau-b is specifically adapted to handle ties.

    The tau-b statistic handles ties (i.e., both members of the pair have the same ordinal value) by a divisor term, which represents the geometric mean between the number of pairs not tied on x and the number not tied on y.

    Kendall's tau is not Spearman's--they are not the same, but they are also quite similar. You'll have to decide, based on context, whether the two are similar enough such one can be substituted for the other.

    For instance, tau-b:

    Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5
    

    P: number of concordant pairs ('concordant' means the ranks of each member of the pair of data points agree)

    Q: number of discordant pairs

    X0: number of pairs not tied on x

    Y0: number of pairs not tied on y

    There is in fact a variant of Spearman's rho that explicitly accounts for ties. In situations in which i needed a non-parametric rank correlation statistic, i have always chosen tau over rho. The reason is that rho sums the squared errors, whereas tau sums the absolute discrepancies. Given that both tau and rho are competent statistics and we are left to choose, a linear penalty on discrepancies (tau) has always seemed to me, a more natural way to express rank correlation. That's not a recommendation, your context might be quite different and dictate otherwise.

    0 讨论(0)
  • 2020-12-04 17:34
    • Ties-corrected Spearman

      Using method="spearman" gives you the ties-corrected Spearman. Spearman's rho, according to the definition, is simply the Pearson's sample correlation coefficient computed for ranks of sample data. So it works both in presence and in absence of ties. You can see that after replacing your original data with their ranks (midranks for ties) and using method="pearson", you will get the same result:

      > cor.test(rank(c(1,2,3,4,5,6,7,8)), rank(c(0,0,0,0,0,0,7,8)), method="pearson")
      
      Pearson's product-moment correlation
      
      data:  rank(c(1, 2, 3, 4, 5, 6, 7, 8)) and rank(c(0, 0, 0, 0, 0, 0, 7, 8))
      t = 2.8983, df = 6, p-value = 0.0274
      alternative hypothesis: true correlation is not equal to 0
      95 percent confidence interval:
       0.1279559 0.9546436
      sample estimates:
        cor 
      0.7637626 
      

      Notice, there exists a simplified no-ties Spearman version, that is in fact used in cor.test() implementation in absence of ties, but it is equivalent to the definition above.

    • P-value

      In case of ties in data, exact p-values are not computed neither for Spearman nor for Kendall measures (within cor.test() implementation), hence the warning. As mentioned in Eduardo's post, for not to get a warning you should set exact=FALSE,

    0 讨论(0)
提交回复
热议问题