C: avoiding overflows when working with big numbers

问题

I've implemented some sorting algorithms (to sort integers) in C, carefully using uint64_t to store anything which has got to do with the data size (thus also counters and stuff), since the algorithms should be tested also with data sets of several giga of integers.

The algorithms should be fine, and there should be no problems about the amount of data allocated: data is stored on files, and we only load little chunks per time, everything works fine even when we choke the in-memory buffers to any size.

Tests with datasets up to 4 giga ints (thus 16GB of data) work fine (sorting 4Gint took 2228 seconds, ~37 minutes), but when we go above that (ie: 8 Gints) the algorithm doesn't seem to halt (it's been running for about 16 hours now).

I'm afraid the problem could be due to integer overflow, maybe a counter in a loop is stored on a 32 bits variable, or maybe we're calling some functions that works with 32 bits integers.
What else could it be?

Is there any easy way to check whether an integer overflow occurs at runtime?

回答1:

This is compiler-specific, but if you're using gcc then you can compile with -ftrapv to issue SIGABRT when signed integral overflow occurs.

For example:

/* compile with gcc -ftrapv <filename> */
#include <signal.h>
#include <stdio.h>
#include <limits.h>

void signalHandler(int sig) {
  printf("Overflow detected\n");
}

int main() {
  signal(SIGABRT, &signalHandler);

  int largeInt = INT_MAX;
  int normalInt = 42;
  int overflowInt = largeInt + normalInt;  /* should cause overflow */

  /* if compiling with -ftrapv, we shouldn't get here */
  return 0;
}

When I run this code locally, the output is

Overflow detected
Aborted

回答2:

Take a look at -ftrapv and -fwrapv:

-ftrapv

This option generates traps for signed overflow on addition, subtraction, multiplication operations.

-fwrapv

This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables other. This option is enabled by default for the Java front-end, as required by the Java language specification.

See also Integer overflow in C: standards and compilers, and Useful GCC flags for C.

回答3:

If you are using Microsoft's compiler, there are options to generate code that triggers a SEH exception when an integer conversion cuts off non-zero bits. In places where this is actually desired, use a bitwise AND to remove the upper bits before doing the conversion.

回答4:

clang now support dynamic overflow checks for both signed and unsigned integers. See -fsanitize=integer switch. For now it is only one C++ compiler with fully supported dynamic overflow checking for debug purpose.

回答5:

The only sure fire way is to wrap operations on those integers into functions that perform bounds violation checking. This will of course slow down integer ops, but if your code asserts or halts on a boundary violation with a meaningful error message, that will go a long way towards helping you identify where the problem is.

As for your particular issue, keep in mind that general case sorting is O(nlogn), so the reason the algorithm is taking much longer could be due to the fact that the increase in time is not linear with respect to the data set size. Since also didn't mention how much physical memory is in the box and how much of it is used for your algorithm, there could possibly be page faulting to disk with the larger data set, thus potentially slowing things to a crawl.

来源：https://stackoverflow.com/questions/5005379/c-avoiding-overflows-when-working-with-big-numbers

标签

Linux

64-bit

overflow