How can I efficiently calculate the binomial cumulative distribution function?

后端 未结 10 929
你的背包
你的背包 2020-12-07 22:10

Let\'s say that I know the probability of a \"success\" is P. I run the test N times, and I see S successes. The test is akin to tossing an unevenly weighted coin (perhaps

相关标签:
10条回答
  • 2020-12-07 22:42

    I can't totally vouch for the efficiency, but Scipy has a module for this

    from scipy.stats.distributions import binom
    binom.cdf(successes, attempts, chance_of_success_per_attempt)
    
    0 讨论(0)
  • 2020-12-07 22:46

    Exact Binomial Distribution

    def factorial(n): 
        if n < 2: return 1
        return reduce(lambda x, y: x*y, xrange(2, int(n)+1))
    
    def prob(s, p, n):
        x = 1.0 - p
    
        a = n - s
        b = s + 1
    
        c = a + b - 1
    
        prob = 0.0
    
        for j in xrange(a, c + 1):
            prob += factorial(c) / (factorial(j)*factorial(c-j)) \
                    * x**j * (1 - x)**(c-j)
    
        return prob
    
    >>> prob(20, 0.3, 100)
    0.016462853241869437
    
    >>> 1-prob(40-1, 0.3, 100)
    0.020988576003924564
    

    Normal Estimate, good for large n

    import math
    def erf(z):
            t = 1.0 / (1.0 + 0.5 * abs(z))
            # use Horner's method
            ans = 1 - t * math.exp( -z*z -  1.26551223 +
                                                    t * ( 1.00002368 +
                                                    t * ( 0.37409196 + 
                                                    t * ( 0.09678418 + 
                                                    t * (-0.18628806 + 
                                                    t * ( 0.27886807 + 
                                                    t * (-1.13520398 + 
                                                    t * ( 1.48851587 + 
                                                    t * (-0.82215223 + 
                                                    t * ( 0.17087277))))))))))
            if z >= 0.0:
                    return ans
            else:
                    return -ans
    
    def normal_estimate(s, p, n):
        u = n * p
        o = (u * (1-p)) ** 0.5
    
        return 0.5 * (1 + erf((s-u)/(o*2**0.5)))
    
    >>> normal_estimate(20, 0.3, 100)
    0.014548164531920815
    
    >>> 1-normal_estimate(40-1, 0.3, 100)
    0.024767304545069813
    

    Poisson Estimate: Good for large n and small p

    import math
    
    def poisson(s,p,n):
        L = n*p
    
        sum = 0
        for i in xrange(0, s+1):
            sum += L**i/factorial(i)
    
        return sum*math.e**(-L)
    
    >>> poisson(20, 0.3, 100)
    0.013411150012837811
    >>> 1-poisson(40-1, 0.3, 100)
    0.046253037645840323
    
    0 讨论(0)
  • 2020-12-07 22:46

    An efficient and, more importantly, numerical stable algorithm exists in the domain of Bezier Curves used in Computer Aided Design. It is called de Casteljau's algorithm used to evaluate the Bernstein Polynomials used to define Bezier Curves.

    I believe that I am only allowed one link per answer so start with Wikipedia - Bernstein Polynomials

    Notice the very close relationship between the Binomial Distribution and the Bernstein Polynomials. Then click through to the link on de Casteljau's algorithm.

    Lets say I know the probability of throwing a heads with a particular coin is P. What is the probability of me throwing the coin T times and getting at least S heads?

    • Set n = T
    • Set beta[i] = 0 for i = 0, ... S - 1
    • Set beta[i] = 1 for i = S, ... T
    • Set t = p
    • Evaluate B(t) using de Casteljau

    or at most S heads?

    • Set n = T
    • Set beta[i] = 1 for i = 0, ... S
    • Set beta[i] = 0 for i = S + 1, ... T
    • Set t = p
    • Evaluate B(t) using de Casteljau

    Open source code probably exists already. NURBS Curves (Non-Uniform Rational B-spline Curves) are a generalization of Bezier Curves and are widely used in CAD. Try openNurbs (the license is very liberal) or failing that Open CASCADE (a somewhat less liberal and opaque license). Both toolkits are in C++, though, IIRC, .NET bindings exist.

    0 讨论(0)
  • 2020-12-07 22:48

    If you are using Python, no need to code it yourself. Scipy got you covered:

    from scipy.stats import binom
    # probability that you get 20 or less successes out of 100, when p=0.3
    binom.cdf(20, 100, 0.3)
    >>> 0.016462853241869434
    
    # probability that you get exactly 20 successes out of 100, when p=0.3
    binom.pmf(20, 100, 0.3)
    >>> 0.0075756449257260777
    
    0 讨论(0)
提交回复
热议问题