Custom Theano Op to do numerical integration

一曲冷凌霜 提交于 2019-12-01 05:50:44

I found your question because I'm trying to build a random variable in PyMC3 that represents a general point process (Hawkes, Cox, Poisson, etc) and the likelihood function has an integral. I really want to be able to use Hamiltonian Monte Carlo or NUTS samplers, so I needed that integral with respect to time to be differentiable.

Starting off of your attempt, I made an integrateOut theano Op that seems to work correctly with the behavior I need. I've tested it out on a few different inputs (not on my stats model just yet, but it appears promising!). I'm a total theano n00b, so pardon any stupidity. I would greatly appreciate feedback if anyone has any. Not sure it's exactly what you're looking for, but here's my solution (example at the bottom and in the doc strings). *EDIT: simplified some remnants of screwing around with ways to do this.

import theano
import theano.tensor as T
from scipy.integrate import quad

class integrateOut(theano.Op):
    """
    Integrate out a variable from an expression, computing
    the definite integral w.r.t. the variable specified
    !!! Only implemented in this for scalars !!!


    Parameters
    ----------
    f : scalar
        input 'function' to integrate
    t : scalar
        the variable to integrate out
    t0: float
        lower integration limit
    tf: float
        upper integration limit

    Returns
    -------
    scalar
        a new scalar with the 't' integrated out

    Notes
    -----

    usage of this looks like:
    x = T.dscalar('x')
    y = T.dscalar('y')
    t = T.dscalar('t')

    z = (x**2 + y**2)*t

    # integrate z w.r.t. t as a function of (x,y)
    intZ = integrateOut(z,t,0.0,5.0)(x,y)
    gradIntZ = T.grad(intZ,[x,y])

    funcIntZ = theano.function([x,y],intZ)
    funcGradIntZ = theano.function([x,y],gradIntZ)

    """
    def __init__(self,f,t,t0,tf,*args,**kwargs):
        super(integrateOut,self).__init__()
        self.f = f
        self.t = t
        self.t0 = t0
        self.tf = tf

    def make_node(self,*inputs):
        self.fvars=list(inputs)
        # This will fail when taking the gradient... don't be concerned
        try:
            self.gradF = T.grad(self.f,self.fvars)
        except:
            self.gradF = None
        return theano.Apply(self,self.fvars,[T.dscalar().type()])

    def perform(self,node, inputs, output_storage):
        # Everything else is an argument to the quad function
        args = tuple(inputs)
        # create a function to evaluate the integral
        f = theano.function([self.t]+self.fvars,self.f)
        # actually compute the integral
        output_storage[0][0] = quad(f,self.t0,self.tf,args=args)[0]

    def grad(self,inputs,grads):
        return [integrateOut(g,self.t,self.t0,self.tf)(*inputs)*grads[0] \
            for g in self.gradF]

x = T.dscalar('x')
y = T.dscalar('y')
t = T.dscalar('t')

z = (x**2+y**2)*t

intZ = integrateOut(z,t,0,1)(x,y)
gradIntZ = T.grad(intZ,[x,y])
funcIntZ = theano.function([x,y],intZ)
funcGradIntZ = theano.function([x,y],gradIntZ)
print funcIntZ(2,2)
print funcGradIntZ(2,2)

SymPy is proving harder than anticipated, but in the meantime in case anyone's finding this useful, I'll also point out how to modify this Op to allow for changing the final timepoint without creating a new Op. This can be useful if you have a point process, or if you have uncertainty in your time measurements.

class integrateOut2(theano.Op):
    def __init__(self, f, int_var, *args,**kwargs):
        super(integrateOut2,self).__init__()
        self.f = f
        self.int_var = int_var

    def make_node(self, *inputs):
        tmax = inputs[0]
        self.fvars=list(inputs[1:])

        return theano.Apply(self, [tmax]+self.fvars, [T.dscalar().type()])

    def perform(self, node, inputs, output_storage):
        # Everything else is an argument to the quad function
        tmax = inputs[0]
        args = tuple(inputs[1:])

        # create a function to evaluate the integral
        f = theano.function([self.int_var]+self.fvars, self.f)

        # actually compute the integral
        output_storage[0][0] = quad(f, 0., tmax, args=args)[0]

    def grad(self, inputs, grads):
        tmax = inputs[0]
        param_grads = T.grad(self.f, self.fvars)

        ## Recall fundamental theorem of calculus
        ## d/dt \int^{t}_{0}f(x)dx = f(t)
        ## So sub in t_max to the graph
        FTC_grad = theano.clone(self.f, {self.int_var: tmax})

        grad_list = [FTC_grad*grads[0]] + \
                    [integrateOut2(grad_fn, self.int_var)(*inputs)*grads[0] \
                     for grad_fn in param_grads]

        return grad_list

I always use the following code where I generate B = 10000 samples of n = 30 observations from a normal distribution with µ = 1 and σ 2 = 2.25. For each sample, the parameters µ and σ are estimated and stored in a matrix. I hope this can help you.

loglik <- function(p,z){
sum(dnorm(z,mean=p[1],sd=p[2],log=TRUE))
}
set.seed(45)
n <- 30
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)

B <- 10000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
for (b in 1:B){
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
bootstrap.results[b,] <- c(m.b$par,m.b$convergence)
}

One can also obtain the ML estimate of λ and use the bootstrap to estimate the bias and the standard error of the estimate. First calculate the MLE of λ Then, we estimate the bias and the standard error of λˆ by a nonparametric bootstrap.

B <- 9999
lambda.B <- rep(NA,B)
n <- length(w.time)
for (b in 1:B){
b.sample <- sample(1:n,n,replace=TRUE)
lambda.B[b] <- 1/mean(w.time[b.sample])
}
bias <- mean(lambda.B-m$estimate)
sd(lambda.B)

In the second part we calculate a 95% confidence interval for the mean time between failures.

n <- length(w.time)
m <- mean(w.time)
se <- sd(w.time)/sqrt(n)
interval.1 <- m + se * qnorm(c(0.025,0.975))
interval.1

But we can also use the the assumption that the data are from an exponential distribution. In that case we have varX¯ = 1/(nλ^2) = θ^{2}/n which can be estimated by X¯^{2}/n.

sd.m <- sqrt(m^2/n)
interval.2 <- m + sd.m * qnorm(c(0.025,0.975))
interval.2

We can also estimate the standard error of ˆθ by means of a boostrap procedure. We use the nonparametric bootstrap, that is, we sample from the original sample with replacement.

B <- 9999
m.star <- rep(NA,B)
for (b in 1:B){
m.star[b] <- mean(sample(w.time,replace=TRUE))
}
sd.m.star <- sd(m.star)
interval.3 <- m + sd.m.star * qnorm(c(0.025,0.975))
interval.3
An interval not based on the assumption of normality of ˆθ is obtained by the percentile method:

interval.4 <- quantile(m.star, probs=c(0.025,0.975))
interval.4
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!