Are there any limits to GMP?

问题

All the documentation for GMP seems to imply that there are no limits. Is this really true?

I want to do some simple integer maths (add, shift, xor, multiply, divide etc) but with truly enormous numbers up to 2^2^96 (that is 2^79,228,162,514,264,337,593,543,950,336 which is likely to be orders of magnitude more memory than you have in your computer) or even 2^2^256. If I go to the trouble of getting GMP and coding against it would it raise its eyebrows at me for asking for such extraordinary numbers or will it just work - as the hype suggests?

I hope to use it with Java so I would probably use the JNI GMP here but I am not really picky about language. Python looks like it can work with GMP.

回答1:

Are there any limits to GMP?

Yes there are. In two respects.

Really large numbers require lots of memory. @hexafraction's answer explores this.
Operations on really large numbers take a really long time. For instance, adding two N-bit numbers requires O(N) operations. Multiplying two N-bit numbers is super-linear¹. (Assuming non-compressed representations ...)

OK, so this is not a limit in the sense that you run up against an hard barrier. But if your program would take an impossibly long time to run, that is clearly a practical limitation.

There was also some discussion about whether GMP does compression or not. There are a number of ways to answer that:

Look at the GMP source code. (@hexafraction says the answer is "no compression")
Try an experiment. Write a little program to create (say) 2^{1,000,000,000} by left-shifting 1, and use top or the equivalent to see how much memory the program uses.
Consider the impact of compression on arithmetic operations. In fact, the last approach is probably the most instructive. It will tell you if it is feasible for a general purpose (or special purpose) bignum library to use compression.

^{1 - Naive long multiplication is O(N^2), but there better algorithms that have better asymptotic performance. For numbers in the region of 2^2^96, you should be looking at the Schönhage–Strassen algorithm, or Fürer's algorithm. In general, the Wikipedia page on multiplication algorithms is a good place to start reading.}

Arithmetic using compressed bignums

Lets assume that the reason we are doing this is that the number is too big to represent in uncompressed form. So uncompressing the operands, doing the operation and compressing the result ... is not a viable option.

If you try to apply a normal arithmetic algorithm to compressed numbers, you need to be able to incrementally decompress the inputs, perform the operation, and compress the output. Is that feasible? Well it depends on the details. For example:

To add two numbers, you start at the least significant end, and add corresponding bits, carry and repeat. The complete operation requires one pass through the input numbers. That would work if your compression scheme is (say) a sparse array of bits, but if you used run length encoding, then you'd need to encode runs from the least to most significant bits.
To multiply two numbers, you basically do a N-bit shift-and-add sequence N times. That can also be done incrementally. But note that we are doing the incremental decompress / compress on each of the shift-and-add cycles ...
To divide ... you do N-bit shift-and-subtract N times. Same as above.

But there are two problems:

The compression / decompression adds an overhead to all of these operations. Assuming you've chosen a suitable compression scheme, the overhead will be a constant multiplier per bit compressed / decompressed.
The second problem is whether the compression scheme will actually be effective, on the inputs and output, AND on the intermediate results in the more complex operations.

So is there an alternative?

Well potentially yes. If you use run length encoding, you could write (say) an addition algorithm, that takes the "runs" into account. For instance:

     10000000000000001
    +10000000000000001

Add the leftmost pairs of digits
```
                10
```
Add the matching runs of zeros
```
  0000000000000010
```
Add the MSBs
```
100000000000000010
```

And then you could build up the more complicated operations from that.

The advantage of this approach (if you can pull it off) is that for suitable inputs it will reduce the complexity of the computation. For example addition is now better than O(N). (I think it should actually be proportional to the size of the run-length encoded representation ...)

But once again, this makes the operations more complicated, and will only be effective if the average length of the runs is large enough to compensate. For numbers that don't compress well enough it will be an anti-optimization.

In summary:

The viability of this approach depends on how compressible the actual numbers are.
It is doubtful that this is a viable approach in a general purpose "big number" library (like GMP). Typical big numbers that we encounter in a numeric context are not sufficiently compressible ... in a way that would help. And if compression doesn't help it probably hinders.
This may be viable in a special purpose "big number" library, provided such a library existed. Under the right circumstances, the compressed arithmetic should have better complexity than ordinary bignum arithmetic.

回答2:

By design, yes. It will try to store and operate on any number you give, though in many cases problems similar to yours will become unreasonable.

Actually, there are limits set by the operating system and computer hardware.

2^2^96 takes 2^96 bits to represent in the best uncompressed case. This equates to a mere 9,904,000,000,000,000 terabytes. Your computer cannot store that much data. In addition, can only index an array up to roughly 4 billion, not enough to manage this giant heap of data. To address each of these bits, we require a 4-billion-entry array of 4-billion-entry-array of 4-billion entry arrays. I'm not totally sure that's even allowed since the total elements are greater than 4 billion.

Anyway, your heap would max out at 4 GB on a 32-bit JVM. On that note, even if you could store so many bits, and you did your operation at 4 GB/sec, it would take 78,460,000,000 years.

Even if the numbers could be compressed(they must be decompressed somewhat) for operations, you still need to take into account that the Kolmogrov complexity of 9 billion terabytes of data is not likely going to be less than an entire terabyte for real-world numbers.

回答3:

While there are no limits at the mpn level, the size of an mpz_t is represented by an int, which is a 32-bit type on all platforms (at least those supported by GMP); see Integer Internals in the GMP manual. This implies that there is a limit of 2^37 bits on 64-bit platforms (an mpz_t integer will have fewer than 2^31 limbs of 64 = 2^6 bits).

Note: This limit of 2^37 bits has been mentioned by Torbjörn Granlund in April 2012 in the gmp-discuss list.

来源：https://stackoverflow.com/questions/17385477/are-there-any-limits-to-gmp

标签

java

python

gmp