Efficient Modulo 3 operation? [duplicate]

问题

Possible Duplicate:
Fast modulo 3 or division algorithm?

Everyone knows that modulo arithmetic can be a huge drawback on performance. Does anyone know of a good alternative for x%3 operations? I know that one exists for x%2, but I really need one for modulo 3 since I want to alternate between three buffers in a for loop.

Thanks!

回答1:

Well instead of the usual "measure it" stuff an actual answer - because that stuff is actually real fun math. Although the compiler could and probably does this as well (at least modern optimizing c++ compilers, javac certainly won't and I've got no idea if the JVM does this) - so better check if it isn't already doing the work for you.

But still fun to know the theory behind the optimization: I'll use assembly because we need the higher 32bit word of a multiplication. The following is from Warren's book on bit twiddling:

n is the input integer we want the modulo from:

li M, 0x55555556   ; load magical number (2^32 + 2) / 3
mulhs q, M, n      ; q = higher word of M * n; i.e. q = floor(M*n / 2^32)
shri t, n, 31      ; add 1 to q if it is negative
add q, q, t

Here q contains the divisor of n / 3 so we just compute the remainder as usual: r = n - q*3

The math is the interesting part - latex would be rather cool here:

q = Floor( (2^32+2)/ 3 * (n / 2^32) ) = Floor( n/3 + 2*n/(3*2^32) )

Now for n = 2^31-1 (largest n possible for signed 32bit integers) the error term is less than 1/3 (and non negative) which makes it quite easy to show that the result is indeed correct. For n = -2^31 we have the correction by 1 above and if you simplify that you'll see that the error term is always larger than -1/3 which means it holds for negative numbers as well.

I leave the proof with the error term bounds for the interested - it's not that hard.

回答2:

If it's in a straight loop, no need to calculate a modulo. Hold a second int var that you reset every 3 steps.

int i, bn = 0;

for(i=0; i<whatever; i++) {
  ...
  if(++bn == 3) bn = 0;
}

And that is not a premature optimisation, it's avoiding unecessary calculation.

EDIT: It was stated in OP that he was using a loop to switch between buffers, so my solution looks quite appropriate. As for the downvote, if it was a mistake, no problem.

回答3:

If 3 is known at compile time, then the compiler will generate the 'tricks' to do it as efficiently as possible. Modulo takes much longer when the divisor is unknown until run-time.

来源：https://stackoverflow.com/questions/8141802/efficient-modulo-3-operation

标签

java

c++

performance

modulo