R: area under curve of ogive?

♀尐吖头ヾ 提交于 2019-12-12 03:38:31

问题


I have an algorithm that uses an x,y plot of sorted y data to produce an ogive.

I then derive the area under the curve to derive %'s.

I'd like to do something similar using kernel density estimation. I like how the upper/lower bounds are smoothed out using kernel densities (i.e. the min and max will extend slightly beyond my hard coded input).

Either way... I was wondering if there is a way to treat an ogive as a type of cumulative distribution function and/or use kernel density estimation to derive a cumulative distribution function given y data?

I apologize if this is a confusing question. I know there is a way to derive a cumulative frequency graph (i.e. ogive). However, I can't determine how to derive a % given this cumulative frequency graph.

What I don't want is an ecdf. I know how to do that, and I am not quite trying to capture an ecdf. But, rather integration of an ogive given two intervals.


回答1:


I'm not exactly sure what you have in mind, but here's a way to calculate the area under the curve for a kernel density estimate (or more generally for any case where you have the y values at equally spaced x-values (though you can, of course, generalize to variable x intervals as well)):

library(zoo)

# Kernel density estimate
# Set n to higher value to get a finer grid
set.seed(67839)
dens = density(c(rnorm(500,5,2),rnorm(200,20,3)), n=2^5)

# How to extract the x and y values of the density estimate
#dens$y
#dens$x

# x interval
dx = median(diff(dens$x))

# mean height for each pair of y values
h = rollmean(dens$y, 2)

# Area under curve
sum(h*dx)  # 1.000943

# Cumulative area
# cumsum(h*dx)

# Plot density, showing points at which density is calculated 
plot(dens)
abline(v=dens$x, col="#FF000060", lty="11")

# Plot cumulative area under curve, showing mid-point of each x-interval
plot(dens$x[-length(dens$x)] + 0.5*dx, cumsum(h*dx), type="l")
abline(v=dens$x[-length(dens$x)] + 0.5*dx, col="#FF000060", lty="11")

UPDATE to include ecdf function

To address your comments, look at the two plots below. The first is the empirical cumulative distribution function (ECDF) of the mixture of normal distributions that I used above. Note that the plot of this data looks the same below as it does above. The second is a plot of the ECDF of a plain vanilla normal distribution, mean=0, sd=1.

set.seed(67839)
x = c(rnorm(500,5,2),rnorm(200,20,3))
plot(ecdf(x), do.points=FALSE)

plot(ecdf(rnorm(1000)))



来源:https://stackoverflow.com/questions/36944874/r-area-under-curve-of-ogive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!