问题
How to treat p value in R ?
I am expecting very low p values like:
1.00E-80
I need to -log10
-log10(1.00E-80)
-log10(0) is Inf, but Inf at sense of rounding too.
But is seems that after 1.00E-308, R yields 0.
1/10^308
[1] 1e-308
1/10^309
[1] 0
Is the accuracy of p-value display with lm
function the same as the cutoff point, 1e-308, or it is just designed such that we need a cutoff point and I need to consider a different cutoff point - such as 1e-100 (for example) to replace 0 with <1e-100.
回答1:
There are a variety of possible answers -- which one is most useful depends on the context:
- R is indeed incapable under ordinary circumstances of storing floating-point values closer to zero than
.Machine$double.xmin
, which varies by platform but is typically (as you discovered) on the order of1e-308
. If you really need to work with numbers this small and can't find a way to work on the log scale directly, you need to search Stack Overflow or the R wiki for methods for dealing with arbitrary/extended precision values (but you probably should try to work on the log scale -- it will be much less of a hassle) - in many circumstances R actually computes p values on the (natural) log scale internally, and can if requested return the log values rather than exponentiating them before giving the answer. For example,
dnorm(-100,log=TRUE)
gives -5000.919. You can convert directly to the log10 scale (without exponentiating and then usinglog10
) by dividing bylog(10)
:dnorm(-100,log=TRUE)/log(10)
=-2171, which would be too small to represent in floating point. For thep***
(cumulative distribution function) functions, uselog.p=TRUE
rather thanlog=TRUE
. (This particular point depends heavily on your particular context. Even if you are not using built-in R functions you may be able to find a way to extract results on the log scale.) - in some cases R presents p-value results as being
<2.2e-16
even when a more precise value is known:(t1 <- t.test(rnorm(10,100),rnorm(10,80)))
prints
....
t = 56.2902, df = 17.904, p-value < 2.2e-16
but you can still extract the precise p-value from the result
> t1$p.value
[1] 1.856174e-18
(in many cases this behaviour is controlled by the format.pval()
function)
An illustration of how all this would work with lm
:
d <- data.frame(x=rep(1:5,each=10))
set.seed(101)
d$y <- rnorm(50,mean=d$x,sd=0.0001)
lm1 <- lm(y~x,data=d)
summary(lm1)
prints the p-value of the slope as <2.2e-16
, but if we use coef(summary(lm1))
(which does not use the p-value formatting), we can see that the value is 9.690173e-203.
A more extreme case:
set.seed(101); d$y <- rnorm(50,mean=d$x,sd=1e-7)
lm2 <- lm(y~x,data=d)
coef(summary(lm2))
shows that the p-value has actually underflowed to zero. However, we can still get an answer on the log scale:
tval <- coef(summary(lm2))["x","t value"]
2*pt(abs(tval),df=48,lower.tail=FALSE,log.p=TRUE)/log(10)
gives -692.62 (you can check this approach with the previous example where the p-value doesn't overflow and see that you get the same answer as printed in the summary).
回答2:
Small numbers are generally hard to deal with.
The limit in R for infinite is caused by the use of double precision floating point :
?double All R platforms are required to work with values conforming to the IEC 60559 (also known as IEEE 754) standard. This basically works with a precision of 53 bits, and represents to that precision a range of absolute values from about 2e-308 to 2e+308.
http://en.wikipedia.org/wiki/Double_precision_floating-point_format
You may find the Rmpfr package helpful here as it allows you to create multiple precision numbers.
install.packages("Rmpfr")
require(Rmpfr)
log(mpfr(1/10^309, precBits=500))
来源:https://stackoverflow.com/questions/11328784/decimal-points-probability-value-of-0-in-language-r