Is -!(condition) a correct way to obtain a full-bitvector from a boolean (mask-boolean)?

问题

In removing conditional branches from high-performance code, converting a true boolean to unsigned long i = -1 to set all bits can be useful.

I came up with a way to obtain this integer-mask-boolean from input of a int b (or bool b) taking values either 1 or 0:

unsigned long boolean_mask = -(!b);

To get the opposite value:

unsigned long boolean_mask = -b;

Has anybody seen this construction before? Am I on to something? When a int value of -1 (which I assume -b or -(!b) does produce) is promoted to a bigger unsigned int type is it guaranteed to set all the bits?

Here's the context:

uint64_t ffz_flipped = ~i&~(~i-1); // least sig bit unset
// only set our least unset bit if we are not pow2-1
i |= (ffz_flipped < i) ? ffz_flipped : 0;

I will inspect the generated asm before asking questions like this next time. Sounds very likely the compiler will not burden the cpu with a branch here.

回答1:

The question you should be asking yourself is this: If you write:

int it_was_true = b > c;

then it_was_true will be either 1 or 0. But where did that 1 come from?

The machine's instruction set doesn't contain an instruction of the form:

Compare R1 with R2 and store either 1 or 0 in R3

or, indeed, anything like that. (I put a note on SSE at the end of this answer, illustrating that the former statement is not quite true.) The machine has an internal condition register, consisting of several condition bits, and the compare instruction -- and a number of other arithmetic operations -- cause those condition bits to be modified in specific ways. Subsequently, you can do a conditional branch, based on some condition bits, or a conditional load, and sometimes other conditional operations.

So actually, it could be a lot less efficient to store that 1 in a variable than it would have been to have directly done some conditional operation. Could have been, but maybe not, because the compiler (or at least, the guys who wrote the compiler) may well be cleverer than you, and it might just remember that it should have put a 1 into it_was_true so that when you actually get around to checking the value, the compiler can emit an appropriate branch or whatever.

So, speaking of clever compilers, you should take a careful look at the assembly code produced by:

uint64_t ffz_flipped = ~i&~(~i-1);

Looking at that expression, I can count five operations: three bitwise negations, one bitwise conjunction (and), and one subtract. But you won't find five operations in the assembly output (at least, if you use gcc -O3). You'll find three.

Before we look at the assembly output, let's do some basic algebra. Here's the most important identity:

-X == ~X + 1

Can you see why that's true? -X, in 2's complement, is just another way of saying 2ⁿ - X, where n is the number of bits in the word. In fact, that's why it's called "2's complement". What about ~X? Well, we can think of that as the result of subtracting every bit in X from the corresponding power of 2. For example, if we have four bits in our word, and X is 0101 (which is 5, or 2² + 2⁰), then ~X is 1010 which we can think of as 2³×(1-0) + 2²×(1-1) + 2¹×(1-0) + 2⁰×(1-1), which is exactly the same as 1111 − 0101. Or, in other words:

−X == 2ⁿ − X
~X == (2ⁿ−1) − X which means that
~X == (−X) − 1

Remember that we had

ffz_flipped = ~i&~(~i-1);

But we now know that we can change ~(~i−1) into minus operations:

~(~i−1) == −(~i−1) − 1 == −(−i - 1 - 1) − 1 == (i + 2) - 1 == i + 1

How cool is that! We could have just written:

ffz_flipped = ~i & (i + 1);

which is only three operations, instead of five.

Now, I don't know if you followed that, and it took me a bit of time to get it right, but now let's look at gcc's output:

    leaq    1(%rdi), %rdx     # rdx = rdi + 1 
    movq    %rdi, %rax        # rax = rdi                                        
    notq    %rax              # rax = ~rax                             
    andq    %rax, %rdx        # rdx &= rax

So gcc just went and figured all that out on its own.

The promised note about SSE: It turns out that SSE can do parallel comparisons, even to the point of doing 16 byte-wise comparisons at a time between two 16-byte registers. Condition registers weren't designed for that, and anyway no-one wants to branch when they don't have to. So the CPU does actually change one of the SSE registers (a vector of 16 bytes, or 8 "words" or 4 "double words", whatever the operation says) into a vector of boolean indicators. But it doesn't use 1 for true; instead, it uses a mask of all 1s. Why? Because it's likely that the next thing the programmer is going to do with that comparison result is use it to mask out values, which I think is just exactly what you were planning to do with your -(!B) trick, except in the parallel streaming version.

So, rest assured, it's been covered.

回答2:

Has anybody seen this construction before? Am I on to something?

Many people have seen it. It's old as rocks. It's not unusual but you should encapsulate it in an inline function to avoid obfuscating your code.

And, verify that you compiler is actually producing branches on the old code, and that it is configured properly, and that this micro-optimization actually improves performance. (And it's a good idea to keep notes on how much time each optimization strategy cuts.)

On the plus side, it is perfectly standard-compliant.

When a int value of -1 (which I assume -b or -(!b) does produce) is promoted to a bigger unsigned int type is it guaranteed to set all the bits?

No, since unsigned numbers are always positive, the result of casting -1 is not special and won't be extended with more ones.

If you have different sizes and want to be anal, try this:

template< typename uint >
uint mask_cast( bool f )
    { return static_cast< uint >( - ! f ); }

回答3:

struct full_mask {
  bool b;
  full_mask(bool b_):b(b_){}
  template<
    typename int_type,
    typename=typename std::enable_if<std::is_unsigned<int_type>::value>::type
  >
  operator int_type() const {
    return -b;
  }
};

use:

unsigned long long_mask = full_mask(b);
unsigned char char_mask = full_mask(b);
char char_mask2 = full_mask(b); // does not compile

basically I use the class full_mask to deduce the type we are casting to, and automatically generate a bit-filled unsigned value of that type. I tossed in some SFINAE code to detect that the type I'm converting to is an unsigned integer type.

回答4:

You can convert 1 / 0 to 0 / -1 just by decrementing. That inverts the boolean condition, but if you can generate the inverse of the boolean in the first place, or use the inverse of the mask, then it's only a single operation instead of two.

来源：https://stackoverflow.com/questions/13907111/is-condition-a-correct-way-to-obtain-a-full-bitvector-from-a-boolean-mask-b

标签

c++

logic

bit-manipulation