Is there a bit-wise trick for checking the divisibility of a number by 2 or 3?

问题

I am looking for a bit-wise test equivalent to (num%2) == 0 || (num%3) == 0.

I can replace num%2 with num&1, but I'm still stuck with num%3 and with the logical-or.

This expression is also equivalent to (num%2)*(num%3) == 0, but I'm not sure how that helps.

回答1:

Yes, though it's not very pretty, you can do something analogous to the old "sum all the decimal digits until you have only one left" trick to test if a number is divisible by 9, except in binary and with divisibility by 3. You can use the same principle for other numbers as well, but many combinations of base/divisor introduce annoying scaling factors so you're not just summing digits anymore.

Anyway, 16ⁿ-1 is divisible by 3, so you can use radix 16, that is, sum the nibbles. Then you're left with one nibble (well, 5 bits really), and you can just look that up. So for example in C# (slightly tested) edit: brute-force tested, definitely works

static bool IsMultipleOf3(uint x)
{
    const uint lookuptable = 0x49249249;
    uint t = (x & 0x0F0F0F0F) + ((x & 0xF0F0F0F0) >> 4);
    t = (t & 0x00FF00FF) + ((t & 0xFF00FF00) >> 8);
    t = (t & 0x000000FF) + ((t & 0x00FF0000) >> 16);
    t = (t & 0xF) + ((t & 0xF0) >> 4);
    return ((lookuptable >> (int)t) & 1) != 0;
}

The trick from my comment, x * 0xaaaaaaab <= 0x55555555, works through a modular multiplicative inverse trick. 0xaaaaaaab * 3 = 1 mod 2³², which means that 0xaaaaaaab * x = x / 3 if and only if
x % 3 = 0. "if" because 0xaaaaaaab * 3 * y = y (because 1 * y = y), so if x is of the form
3 * y then it will map back to y. "only if" because no two inputs are mapped to the same output, so everything not divisible by 3 will map to something higher than the highest thing you can get by dividing anything by 3 (which is 0xFFFFFFFF / 3 = 0x55555555).

You can read more about this (including the more general form, which includes a rotation) in Division by Invariant Integers using Multiplication (T. Granlund and P. L. Montgomery).

You compiler may not know this trick. For example this:

uint32_t foo(uint32_t x)
{
    return x % 3 == 0;
}

Becomes, on Clang 3.4.1 for x64,

movl    %edi, %eax
movl    $2863311531, %ecx       # imm = 0xAAAAAAAB
imulq   %rax, %rcx
shrq    $33, %rcx
leal    (%rcx,%rcx,2), %eax
cmpl    %eax, %edi
sete    %al
movzbl  %al, %eax
ret

G++ 4.8:

mov eax, edi
mov edx, -1431655765
mul edx
shr edx
lea eax, [rdx+rdx*2]
cmp edi, eax
sete    al
movzx   eax, al
ret

What it should be:

imul eax, edi, 0xaaaaaaab
cmp eax, 0x55555555
setbe al
movzx eax, al
ret

回答2:

I guess I'm a bit late to this party, but here's a slightly faster (and slightly prettier) solution than the one from harold:

bool is_multiple_of_3(std::uint32_t i)
{
    i = (i & 0x0000FFFF) + (i >> 16);
    i = (i & 0x00FF) + (i >> 8);
    i = (i & 0x0F) + (i >> 4);
    i = (i & 0x3) + (i >> 2);
    const std::uint32_t lookuptable = 0x49249249;
    return ((lookuptable >> i) & 1) != 0;
}

It's C++11, but that doesn't really matter for this piece of code. It's also brute-force tested for 32-bit unsigned ints. It saves you at least one bit-fiddling op for each of the first four steps. It also scales beautifully to 64 bits - only one additional step needed at the beginning.

The last two lines are obviously and shamelessly taken from harold's solution (nice one, I wouldn't have done that so elegantly).

Possible further optimizations:

The & ops in the first two steps will be optimized away by just using the lower-half registers on architectures that have them (x86, for example).
The largest possible output from the third step is 60, and from the fourth step it's 15 (when the function argument is 0xFFFFFFFF). Given that, we can eliminate the fourth step, use a 64-bit lookuptable and shift directly into that following the third step. This turns out to be a bad idea for Visual C++ 2013 in 32-bit mode, as the right shift turns into a non-inline call to code that does a lot of tests and jumps. However, it should be a good idea if 64-bit registers are natively available.
The point above needs to be reevaluated if the function is modified to take a 64-bit argument. The maximum outputs from the last two steps (which will be steps 4 and 5 after adding one step at the beginning) will be 75 and 21 respectively, which means we can no longer eliminate the last step.

The first four steps are based on the fact that a 32-bit number can be written as

(high 16 bits) * 65536 + (low 16 bits) = 
(high 16 bits) * 65535 + (high 16 bits) + (low 16 bits) = 
(high 16 bits) * 21845 * 3 + ((high 16 bits) + (low 16 bits))

So the whole thing is divisible by 3 if and only if the right parenthesis is divisible by 3. And so on, as this holds for 256 = 85 * 3 + 1, 16 = 5 * 3 + 1, and 4 = 3 + 1. (Of course, this is generally true for even powers of two; odd powers are one less than the nearest multiple of 3.)

The numbers that are input into the following steps will be larger than 16-bit, 8-bit, and 4-bit respectively in some cases, but that's not a problem, as we're not dropping any high-order bits when shifting right.

来源：https://stackoverflow.com/questions/27846924/is-there-a-bit-wise-trick-for-checking-the-divisibility-of-a-number-by-2-or-3

标签

java

python

c++

bit-manipulation