Why can't I get a p-value smaller than 2.2e-16?

前端未结

关注

 6  777

I\'ve found this issue with t-tests and chi-squared in R but I assume this issue applies generally to other tests. If I do:

a <- 1:10
b <- 100:110
t.t


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2020-12-13 18:37
              
            
            
                                                                       
Try something like this t.test(a,b)$p.value see if that gives you the accuracy you need. I believe it has more to do with the printing of the result than it does the actual stored computer value which should have the necessary precision.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2020-12-13 18:46
              
            
            
                                                                       
I'm puzzled by several things in the exchange of answers and comments here.

First of all, when I try the OP's original example I don't get a p value as small as the ones that are being debated here (several different 2.13.x versions and R-devel):

a <- 1:10
b <- 10:20
t.test(a,b)
## data:  a and b 
## t = -6.862, df = 18.998, p-value = 1.513e-06


Second, when I make the difference between groups much bigger, I do in fact get the results suggested by @eWizardII:

a <- 1:10
b <- 110:120
(t1 <- t.test(a,b))
# data:  a and b 
# t = -79.0935, df = 18.998, p-value < 2.2e-16
#
> t1$p.value
[1] 2.138461e-25


The behavior of the printed output in t.test is driven by its call to stats:::print.htest (which is also called by other statistical testing functions such as chisq.test, as noted by the OP), which in turn calls format.pval, which presents p values less than its value of eps (which is .Machine$double.eps by default) as < eps.  I'm surprised to find myself disagreeing with such generally astute commenters ...

Finally, although it seems silly to worry about the precise value of a very small p value, the OP is correct that these values are often used as indices of strength of evidence in the bioinformatics literature -- for example, one might test 100,000 candidate genes and look at the distribution of resulting p values (search for "volcano plot" for one example of this sort of procedure).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2020-12-13 18:50
              
            
            
                                                                       
Recently had same problem. Fellow statistician recommends:

A <- cor.test(…)
p <- 2* pt(A$statistic,  df = A$parameter, lower.tail=FALSE)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2020-12-13 18:52
              
            
            
                                                                       
Some R packages solve this issue. The best way is through package pspearman.

source("http://www.bioconductor.org/biocLite.R")
biocLite("pspearman")
library("pspearman")
a=c(1:110,110)
b=1:111
out <- spearman.test(a, b, alternative = "greater", approximation="t-distribution")
out$p.value


[1] 3.819961e-294
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2020-12-13 18:56
              
            
            
                                                                       
The Wikipedia page you linked to was for the Decimal64 type which R does not use – it uses standard-issue doubles.  

First, some definitions from the .Machine help page.


  double.eps: the smallest positive floating-point number ‘x’ such that
  ‘1 + x != 1’. ... Normally ‘2.220446e-16’.
  
  double.xmin: the smallest non-zero normalized floating-point number
  ... Normally ‘2.225074e-308’.


So you can represent numbers smaller than 2.2e-16, but their accuracy is dimished, and it causes problems with calculations.  Try some examples with numbers close to the smallest representable value.

2e-350 - 1e-350
sqrt(1e-350)


You mentioned in a comment that you wanted to do bonferroni corrections.  Rather than rolling your own code for this, I suggest that you use p.adjust(your_p_value, method = "bonferroni") instead.  pairwise.t.test uses this.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  后悔当初        
                
              
                            
                2020-12-13 18:59
              
            
            
                                                                       
Two questions:

1) What possible difference in statistical implication would there be between p-values of 1e-16 and 1e-32? If you truly can justify it then using the logged values is the way to go.

2) Why do you use Wikipedia when your interest in in the numerical accuracy of R?

The R-FAQ says "Other [meaning non-integer] numbers have to be rounded to (typically) 53 binary digits accuracy." 16 digits is about the limit. This is how to get the limits of accuracy when at the console:

> .Machine$double.eps
[1] 2.220446e-16


That number is effectively zero when interpreted on a range of [0,1]
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复