问题
I need to optimize some code where I multiply a vector of ints (32 bit) by a scalar modulo p (where p is the prime number (2^32)-5) and then subtract that vector from another vector modulo p.
The code looks like this:
public static void multiplyAndSubtract(long fragmentCoefficient, long[] equationToSubtractFrom, long[] equationToSubtract) {
    for (int i = 0; i < equationToSubtractFrom.length; i++) {
        equationToSubtractFrom[i] =  modP(equationToSubtractFrom[i] - multiplyModP(fragmentCoefficient, equationToSubtract[i]));
    }
}
I'm using longs because Java doesn't support unsigned integers but both vectors are mod p so you can expect every number to be 0 <= x < (2^32)-5
Any ideas to optimize this? The mod p operation is what's taking up most of the execution time so one way to optimize this could to somehow not do modP after the multiplication and only do it after the subtraction. Any ideas on how to do that?
回答1:
It is possible to speed up calculation and avoid any division at all using the fact that 2^32 = 5 (mod p).
After multiplication and subtraction, split the result to low (x%2^32) and hi (x / 2^32) parts. Then multiply the hi part to 5 and sum with the low part. Then repeat this procedure once more. If the result is greater than p, subtract p. For negative result, add p.
Edit: Since combined multiplication and subtraction may overflow, the result of multiplication should be also taken modulo p. But only one step of the above procedure is enough: just split, multiply to 5, and add.
回答2:
e - (f * e mod p) mod p = (e-e f) mod p
See Wolfram Alpha.
回答3:
I know of two ways to do this without using division or modulus:
Method 1: Invariant Multiplication. (see this paper)
The basic idea here is to pre-compute and approximation of the reciprocal of the p which allows you do an integer division using just a couple of integer multiplications. Then you can multiply back and obtain your modulus. This is the easier approach to implement.
Method 2: (the one that I usually use), is to use floating-point. Convert the numbers to floating-point and multiply by a pre-computed reciprocal of p. Then round and convert back to an integer. This approach is harder to get right, but from my experience it's faster if done properly.
Both approaches here involve no divisions aside from a pre-computation of the reciprocal either in integer or floating-point.
Whether or not either of these methods will be faster than straight-forward use of %, will depend on how well you can implement them. So I can't guarantee that either one will be faster.
来源:https://stackoverflow.com/questions/7914637/fast-multiplication-and-subtraction-modulo-a-prime