问题
I am looking for a bit-wise test equivalent to (num%2) == 0 || (num%3) == 0.
I can replace num%2 with num&1, but I'm still stuck with num%3 and with the logical-or.
This expression is also equivalent to (num%2)*(num%3) == 0, but I'm not sure how that helps.
回答1:
Yes, though it's not very pretty, you can do something analogous to the old "sum all the decimal digits until you have only one left" trick to test if a number is divisible by 9, except in binary and with divisibility by 3. You can use the same principle for other numbers as well, but many combinations of base/divisor introduce annoying scaling factors so you're not just summing digits anymore.
Anyway, 16n-1 is divisible by 3, so you can use radix 16, that is, sum the nibbles. Then you're left with one nibble (well, 5 bits really), and you can just look that up. So for example in C# (slightly tested) edit: brute-force tested, definitely works
static bool IsMultipleOf3(uint x)
{
const uint lookuptable = 0x49249249;
uint t = (x & 0x0F0F0F0F) + ((x & 0xF0F0F0F0) >> 4);
t = (t & 0x00FF00FF) + ((t & 0xFF00FF00) >> 8);
t = (t & 0x000000FF) + ((t & 0x00FF0000) >> 16);
t = (t & 0xF) + ((t & 0xF0) >> 4);
return ((lookuptable >> (int)t) & 1) != 0;
}
The trick from my comment, x * 0xaaaaaaab <= 0x55555555, works through a modular multiplicative inverse trick. 0xaaaaaaab * 3 = 1 mod 232, which means that 0xaaaaaaab * x = x / 3 if and only ifx % 3 = 0. "if" because 0xaaaaaaab * 3 * y = y (because 1 * y = y), so if x is of the form3 * y then it will map back to y. "only if" because no two inputs are mapped to the same output, so everything not divisible by 3 will map to something higher than the highest thing you can get by dividing anything by 3 (which is 0xFFFFFFFF / 3 = 0x55555555).
You can read more about this (including the more general form, which includes a rotation) in Division by Invariant Integers using Multiplication (T. Granlund and P. L. Montgomery).
You compiler may not know this trick. For example this:
uint32_t foo(uint32_t x)
{
return x % 3 == 0;
}
Becomes, on Clang 3.4.1 for x64,
movl %edi, %eax
movl $2863311531, %ecx # imm = 0xAAAAAAAB
imulq %rax, %rcx
shrq $33, %rcx
leal (%rcx,%rcx,2), %eax
cmpl %eax, %edi
sete %al
movzbl %al, %eax
ret
G++ 4.8:
mov eax, edi
mov edx, -1431655765
mul edx
shr edx
lea eax, [rdx+rdx*2]
cmp edi, eax
sete al
movzx eax, al
ret
What it should be:
imul eax, edi, 0xaaaaaaab
cmp eax, 0x55555555
setbe al
movzx eax, al
ret
回答2:
I guess I'm a bit late to this party, but here's a slightly faster (and slightly prettier) solution than the one from harold:
bool is_multiple_of_3(std::uint32_t i)
{
i = (i & 0x0000FFFF) + (i >> 16);
i = (i & 0x00FF) + (i >> 8);
i = (i & 0x0F) + (i >> 4);
i = (i & 0x3) + (i >> 2);
const std::uint32_t lookuptable = 0x49249249;
return ((lookuptable >> i) & 1) != 0;
}
It's C++11, but that doesn't really matter for this piece of code. It's also brute-force tested for 32-bit unsigned ints. It saves you at least one bit-fiddling op for each of the first four steps. It also scales beautifully to 64 bits - only one additional step needed at the beginning.
The last two lines are obviously and shamelessly taken from harold's solution (nice one, I wouldn't have done that so elegantly).
Possible further optimizations:
- The
&ops in the first two steps will be optimized away by just using the lower-half registers on architectures that have them (x86, for example). - The largest possible output from the third step is
60, and from the fourth step it's15(when the function argument is0xFFFFFFFF). Given that, we can eliminate the fourth step, use a 64-bitlookuptableand shift directly into that following the third step. This turns out to be a bad idea for Visual C++ 2013 in 32-bit mode, as the right shift turns into a non-inline call to code that does a lot of tests and jumps. However, it should be a good idea if 64-bit registers are natively available. - The point above needs to be reevaluated if the function is modified to take a 64-bit argument. The maximum outputs from the last two steps (which will be steps 4 and 5 after adding one step at the beginning) will be
75and21respectively, which means we can no longer eliminate the last step.
The first four steps are based on the fact that a 32-bit number can be written as
(high 16 bits) * 65536 + (low 16 bits) =
(high 16 bits) * 65535 + (high 16 bits) + (low 16 bits) =
(high 16 bits) * 21845 * 3 + ((high 16 bits) + (low 16 bits))
So the whole thing is divisible by 3 if and only if the right parenthesis is divisible by 3. And so on, as this holds for 256 = 85 * 3 + 1, 16 = 5 * 3 + 1, and 4 = 3 + 1. (Of course, this is generally true for even powers of two; odd powers are one less than the nearest multiple of 3.)
The numbers that are input into the following steps will be larger than 16-bit, 8-bit, and 4-bit respectively in some cases, but that's not a problem, as we're not dropping any high-order bits when shifting right.
来源:https://stackoverflow.com/questions/27846924/is-there-a-bit-wise-trick-for-checking-the-divisibility-of-a-number-by-2-or-3