Why can't I get a p-value smaller than 2.2e-16?

前端 未结 6 770
死守一世寂寞
死守一世寂寞 2020-12-13 18:13

I\'ve found this issue with t-tests and chi-squared in R but I assume this issue applies generally to other tests. If I do:

a <- 1:10
b <- 100:110
t.t         


        
相关标签:
6条回答
  • 2020-12-13 18:37

    Try something like this t.test(a,b)$p.value see if that gives you the accuracy you need. I believe it has more to do with the printing of the result than it does the actual stored computer value which should have the necessary precision.

    0 讨论(0)
  • 2020-12-13 18:46

    I'm puzzled by several things in the exchange of answers and comments here.

    First of all, when I try the OP's original example I don't get a p value as small as the ones that are being debated here (several different 2.13.x versions and R-devel):

    a <- 1:10
    b <- 10:20
    t.test(a,b)
    ## data:  a and b 
    ## t = -6.862, df = 18.998, p-value = 1.513e-06
    

    Second, when I make the difference between groups much bigger, I do in fact get the results suggested by @eWizardII:

    a <- 1:10
    b <- 110:120
    (t1 <- t.test(a,b))
    # data:  a and b 
    # t = -79.0935, df = 18.998, p-value < 2.2e-16
    #
    > t1$p.value
    [1] 2.138461e-25
    

    The behavior of the printed output in t.test is driven by its call to stats:::print.htest (which is also called by other statistical testing functions such as chisq.test, as noted by the OP), which in turn calls format.pval, which presents p values less than its value of eps (which is .Machine$double.eps by default) as < eps. I'm surprised to find myself disagreeing with such generally astute commenters ...

    Finally, although it seems silly to worry about the precise value of a very small p value, the OP is correct that these values are often used as indices of strength of evidence in the bioinformatics literature -- for example, one might test 100,000 candidate genes and look at the distribution of resulting p values (search for "volcano plot" for one example of this sort of procedure).

    0 讨论(0)
  • 2020-12-13 18:50

    Recently had same problem. Fellow statistician recommends:

    A <- cor.test(…)
    p <- 2* pt(A$statistic,  df = A$parameter, lower.tail=FALSE)
    
    0 讨论(0)
  • 2020-12-13 18:52

    Some R packages solve this issue. The best way is through package pspearman.

    source("http://www.bioconductor.org/biocLite.R")
    biocLite("pspearman")
    library("pspearman")
    a=c(1:110,110)
    b=1:111
    out <- spearman.test(a, b, alternative = "greater", approximation="t-distribution")
    out$p.value
    

    [1] 3.819961e-294

    0 讨论(0)
  • 2020-12-13 18:56

    The Wikipedia page you linked to was for the Decimal64 type which R does not use – it uses standard-issue doubles.

    First, some definitions from the .Machine help page.

    double.eps: the smallest positive floating-point number ‘x’ such that ‘1 + x != 1’. ... Normally ‘2.220446e-16’.

    double.xmin: the smallest non-zero normalized floating-point number ... Normally ‘2.225074e-308’.

    So you can represent numbers smaller than 2.2e-16, but their accuracy is dimished, and it causes problems with calculations. Try some examples with numbers close to the smallest representable value.

    2e-350 - 1e-350
    sqrt(1e-350)
    

    You mentioned in a comment that you wanted to do bonferroni corrections. Rather than rolling your own code for this, I suggest that you use p.adjust(your_p_value, method = "bonferroni") instead. pairwise.t.test uses this.

    0 讨论(0)
  • 2020-12-13 18:59

    Two questions:

    1) What possible difference in statistical implication would there be between p-values of 1e-16 and 1e-32? If you truly can justify it then using the logged values is the way to go.

    2) Why do you use Wikipedia when your interest in in the numerical accuracy of R?

    The R-FAQ says "Other [meaning non-integer] numbers have to be rounded to (typically) 53 binary digits accuracy." 16 digits is about the limit. This is how to get the limits of accuracy when at the console:

    > .Machine$double.eps
    [1] 2.220446e-16
    

    That number is effectively zero when interpreted on a range of [0,1]

    0 讨论(0)
提交回复
热议问题