Avoiding interger overflow with permutation (nPr, nCr) functions in C

问题

I am attempting to do some statistics-related functions so I can carry out a few related procedures (ie: statistics calculations for probabilities, generate Pascal's triangle for an arbitrary depth, etc).

I have encountered an issue where I am likely dealing with overflow. For example, if I want to calculate nPr for (n=30,p=1), I know that I can reduce it to:

30P1 = 30! / (30 - 1)!
     = 30! / (29)!
     = 30! / 29!
     = 30

However, when calculating using the functions below, it looks like I will always get invalid values due to integer overflow. Are there any workarounds that don't require the use of a library to support arbitrarily large numbers? I've read up a bit in other posts on the gamma functions, but couldn't find concrete examples.

int factorial(int n) {
   return (n == 1 || n == 0) ? 1 : factorial(n - 1) * n;
}


int nCr(int n, int r) {
   return (nPr(n,r) / factorial(r));
   //return factorial(n) / factorial(r) / factorial(n-r));
}


int nPr(int n, int r) {
   return (factorial(n) / factorial(n-r));
}

回答1:

You look like you are on the right track, so here you go:

#include <math.h>
#include <stdio.h>

int nCr(int n, int r) {
   if(r>n) {
      printf("FATAL ERROR"); return 0;
     }       
   if(n==0 || r==0 || n==r) {
      return 1;
   } else {
      return (int)lround( ((double)n/(double)(n-r)/(double)r) * exp(lgamma(n) - lgamma(n-r) - lgamma(r)));
   }
}


int nPr(int n, int r) {
   if(r>n) {printf("FATAL ERROR"; return 0;}
   if(n==0 || r==0) {
      return 1;
   } else {
      if (n==r) {
         r = n - 1;
      }
      return (int)lround( ((double)n/(double)(n-r)) * exp(lgamma(n) - lgamma(n-r)));
   }
}

To compile, do: gcc -lm myFile.c && ./a.out

Note that the accuracy of your results is limited by the bit-depth of the double data type. You should be able to get good results with this, but be warned: replacing all the ints above with long long unsigned may not necessarily guarantee accurate results for larger values of n,r. At some point, you will still need some math library to handle arbitrarily large values, but this should help you avoid that for smaller input values.

回答2:

I think you have two choices:

Use a big integer library. This way you won't lose precision (floating point might work for some cases, but is a poor substitute).
Restructure your functions, so they won't reach high intermediate values. E.g. factorial(x)/factorial(y) is the product of all numbers from y+1 to x. So just write a loop and multiply. This way, you'll only get an overflow if the final result overflows.

回答3:

If you don't have to deal with signed values (and it doesn't appear that you do), you could try using a larger integral type, e.g., unsigned long long. If that doesn't suffice, you'd need to use a non-standard library that supports arbitrarily long integers. Note that the use of the long long type requires C99 compiler support (if you use GCC, might have to compile with -std=c99).

Edit: you might be able to fit more into a long double, which is 80-bits on some systems.

回答4:

I might be being dense, but it seems to me that going to doubles and the gamma function is overkill here.

Are there any workarounds that don't require the use of a library to support arbitrarily large numbers?

Sure there are. You know exactly what you're dealing with at all times - products of ranges of integers. A range of integers is a special case of a finite list of integers. I have no idea what an idiomatic way of representing a list is in C, so I'll stick to C-ish pseudocode:

make_list(from, to)
    return a list containing from, from+1, ..., to

concatenate_lists(list1, list2)
    return a list with all the elements from list1 and list2

calculate_list_product(list)
    return list[0] * list[1] * ... * list[last]

calculate_list_quotient(numerator_list, denominator_list)
    /* 
    be a bit clever here: form the product of the elements of numerator_list, but
    any time the running product is divisible by an element of denominator_list,
    divide the running product by that element and remove it from consideration
    */

n_P_r(n, r)
   /* nPr is n! / (n-r)! 
      which simplifies to n * n-1 * ... * r+1
       so we can just: */
   return calculate_list_product(make_list(r+1, n)) 

n_C_r(n, r)
   /* nCr is n! / (r! (n-r)!) */
    return calculate_list_quotient(
        make_list(1, n), 
        concatenate_lists(make_list(1, r), make_list(1, n-r))
    )

Note that we never actually calculate a factorial!

回答5:

Here is a way to calculate without using gamma functions. It relies on the fact that n_C_r = (n/r) * ((n-1)C(r-1)) and that for any positive value, n_C_0 = 1 so we could use it write a recusrive function like below

public long combination(long n, long r) {
    if(r==0)
        return 1;
    else {
        long num = n * combination(n - 1, r - 1);
        return num/r;
    }
}

来源：https://stackoverflow.com/questions/11016069/avoiding-interger-overflow-with-permutation-npr-ncr-functions-in-c

标签

math

statistics