问题
I have two sets of data
I had plotted two probability density functions. Now I want the area between the two probability density functions, which are in certain x range.
I tried to integrate the area, trapezoidal rule etc:
Calculating the area between a curve and a straight line without finding the function
Error calculating the area between two lines using "integrate"
How to measure area between 2 distribution curves in R / ggplot2
but all are in vain.
Here is the link to the data i am working on.
https://sheet.zoho.com/sheet/editor.do?doc=1ff030ea1af35f06f8303927d7ea62b3c4b04bdae021555e8cc43ed0569cb2aaceb26368f93db4d15ac66cf7662d9a7873e889e1763139a49ffd68e7843e0b44
dens.pre=density(TX/10)
dens.post=density(TX30/10)`
plot(dens.pre,col="green")
lines(dens.post,col="red")
locator()
#$x
#[1] 18.36246
#$y
#[1] 0.05632428
abline(v=18.3,col="red")
Finding the area between the two curves for X > 18.3.
Area between the curves:
回答1:
With trapezoidal rule you could probably calculate it like this:
d0 <- dens.pre
d1 <- dens.post
f0 <- approxfun(d0$x, d0$y)
f1 <- approxfun(d1$x, d1$y)
# defining x range of the density overlap
ovrng <- c(18.3, min(max(d0$x), max(d1$x)))
# dividing it to sections (for example n=500)
i <- seq(min(ovrng), max(ovrng), length.out=500)
# calculating the distance between the density curves
h1 <- f0(i)-f1(i)
h2 <- f1(i)-f0(i)
#and using the formula for the area of a trapezoid we add up the areas
area1<-sum( (h1[-1]+h1[-length(h1)]) /2 *diff(i) *(h1[-1]>=0+0)) # for the regions where d1>d0
area2<-sum( (h2[-1]+h2[-length(h2)]) /2 *diff(i) *(h2[-1]>=0+0)) # for the regions where d1<d0
area_total <- area1 + area2
area_total
Though, since you are interested only in the area where one curve remain below the other for the whole range, this can be shortened:
d0 <- dens.pre
d1 <- dens.post
f0 <- approxfun(d0$x, d0$y)
f1 <- approxfun(d1$x, d1$y)
# defining x range of the density overlap
ovrng <- c(18.3, min(max(d0$x), max(d1$x)))
# dividing it to sections (for example n=500)
i <- seq(min(ovrng), max(ovrng), length.out=500)
# calculating the distance between the density curves
h1 <- f1(i)-f0(i)
#and using the formula for the area of a trapezoid we add up the areas where d1>d0
area<-sum( (h1[-1]+h1[-length(h1)]) /2 *diff(i) *(h1[-1]>=0+0))
area
#We can plot the region using
plot(d0, main="d0=black, d1=green")
lines(d1, col="green")
jj<-which(h>0 & seq_along(h) %% 5==0); j<-i[jj];
segments(j, f1(j), j, f1(j)-h[jj])
There are other (and more detailed) solutions here and here
来源:https://stackoverflow.com/questions/55337926/area-between-the-two-curves