问题
I have created an approxfun function from the Binsmooth package for finding means from binned data.
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,9452,92166,103217)
splb <- splinebins(binedges, bincounts, 76091)
typing splb$splineCDF(x) will return y, but I want to find the median value.
I understand that this function is supposed to achieve this goal, but it doesn't appear to work for functions created with the Binsmooth package.
get x-value given y-value: general root finding for linear / non-linear interpolation function
I've put together a simple way that will find an approximate value, but it is not very satisfying and very computer intensive:
splb$splineCDF(50000)
fn(1000)
probability<- 0
income<- 0
while(probability< 0.5){
probability<- splb$splineCDF(income)
income<- income+ 10
}
Any ideas?
回答1:
I'd be tempted to first try using a numerical optimiser to find the median for me, see if it works well enough. Validating in this case is easy by checking how close splb$splineCDF is to .5. You could add a test e.g. if abs(splb$splineCDF(solution) - .5) > .001 then stop the script and debug.
Solution uses optimize from the stats base R package
# manual step version
manual_version <- function(splb){
probability<- 0
income<- 0
while(probability< 0.5){
probability<- splb$splineCDF(income)
income<- income+ 10
}
return(income)
}
# try using a one dimensional optimiser - see ?optimize
optim_version <- function(splb, plot=TRUE){
# requires a continuous function to optimise, with the minimum at the median
objfun <- function(x){
(.5-splb$splineCDF(x))^2
}
# visualise the objective function
if(plot==TRUE){
x_range <- seq(min(binedges, na.rm=T), max(binedges, na.rm=T), length.out = 100)
z <- objfun(x_range)
plot(x_range, z, type="l", main="objective function to minimise")
}
# one dimensional optimisation to get point closest to .5 cdf
out <- optimize(f=objfun, interval = range(binedges, na.rm=TRUE))
return(out$minimum)
}
# test them out
v1 <- manual_version(splb)
v2 <- optim_version(splb, plot=TRUE)
splb$splineCDF(v1)
splb$splineCDF(v2)
# time them
library(microbenchmark)
microbenchmark("manual"={
manual_version(splb)
}, "optim"={
optim_version(splb, plot=FALSE)
}, times=50)
来源:https://stackoverflow.com/questions/55965970/aproxfun-function-from-binsmooth-package-find-x-from-y-value