find the index of the highest bit set of a 32-bit number without loops obviously

前端未结

关注

 11  1420

梦毁少年i 2021-01-07 01:56

Here\'s a tough one(atleast i had a hard time :P):

find the index of the highest bit set of a 32-bit number without using any loops.

11条回答

予麋鹿 (楼主)

2021-01-07 02:24

Very interesting question, I will provide you an answer with benchmark

Solution using a loop

uint8_t highestBitIndex( uint32_t n )
{
    uint8_t r = 0;
    while ( n >>= 1 )
        r++;
    return r;
}

This help to better understand the question but is highly inefficient.

Solution using log

This approach can also be summarize by the log method

uint8_t highestSetBitIndex2(uint32_t n) {
    return (uint8_t)(log(n) / log(2));
}

However it is also inefficient (even more than above one, see benchmark)

Solution using built-in instruction

uint8_t highestBitIndex3( uint32_t n )
{
    return 31 - __builtin_clz(n);
}

This solution, while very efficient, suffer from the fact that it only work with specific compilers (gcc and clang will do) and on specific platforms.

NB: It is 31 and not 32 if we want the index

Solution with intrinsic

#include  

uint8_t highestSetBitIndex5(uint32_t n)
{
    return _bit_scan_reverse(n); // undefined behavior if n == 0
}

This will call the bsr instruction at assembly level

Solution using inline assembly

LZCNT and BSR can be summarize in assembly with the below functions:

uint8_t highestSetBitIndex4(uint32_t n) // undefined behavior if n == 0
{
    __asm__ __volatile__ (R"(
        .intel_syntax noprefix
            bsr eax, edi
        .att_syntax noprefix
        )"
            );
}

uint8_t highestSetBitIndex7(uint32_t n) // undefined behavior if n == 0
{
    __asm__ __volatile__ (R"(.intel_syntax noprefix
        lzcnt ecx, edi
        mov eax, 31
        sub eax, ecx
        .att_syntax noprefix
    )");
}

NB: Do Not Use unless you know what you are doing

Solution using lookup table and magic number multiplication (probably the best AFAIK)

First you use the following function to clear all the bits except the highest one:

uint32_t keepHighestBit( uint32_t n )
{
    n |= (n >>  1);
    n |= (n >>  2);
    n |= (n >>  4);
    n |= (n >>  8);
    n |= (n >> 16);
    return n - (n >> 1);
}

Credit: The idea come from Henry S. Warren, Jr. in his book Hacker's Delight

Then we use an algorithm based on DeBruijn's Sequence to perform a kind of binary search:

uint8_t highestBitIndex8( uint32_t b )
{
    static const uint32_t deBruijnMagic = 0x06EB14F9; // equivalent to 0b111(0xff ^ 3)
    static const uint8_t deBruijnTable[64] = {
         0,  0,  0,  1,  0, 16,  2,  0, 29,  0, 17,  0,  0,  3,  0, 22,
        30,  0,  0, 20, 18,  0, 11,  0, 13,  0,  0,  4,  0,  7,  0, 23,
        31,  0, 15,  0, 28,  0,  0, 21,  0, 19,  0, 10, 12,  0,  6,  0,
         0, 14, 27,  0,  0,  9,  0,  5,  0, 26,  8,  0, 25,  0, 24,  0,
     };
    return deBruijnTable[(keepHighestBit(b) * deBruijnMagic) >> 26];
}

Another version:

void propagateBits(uint32_t *n) {
    *n |= *n >> 1;
    *n |= *n >> 2;
    *n |= *n >> 4;
    *n |= *n >> 8;
    *n |= *n >> 16;
}

uint8_t highestSetBitIndex8(uint32_t b)
{
  static const uint32_t Magic = (uint32_t) 0x07C4ACDD;

  static const int BitTable[32] = {
     0,  9,  1, 10, 13, 21,  2, 29,
    11, 14, 16, 18, 22, 25,  3, 30,
     8, 12, 20, 28, 15, 17, 24,  7,
    19, 27, 23,  6, 26,  5,  4, 31,
  };
  propagateBits(&b);

  return BitTable[(b * Magic) >> 27];
}

Benchmark with 100 million calls

compiling with g++ -std=c++17 highestSetBit.cpp -O3 && ./a.out

highestBitIndex1  136.8 ms (loop)  
highestBitIndex2  183.8 ms (log(n) / log(2)) 
highestBitIndex3   10.6 ms (de Bruijn lookup Table with power of two, 64 entries)
highestBitIndex4   4.5 ms (inline assembly bsr)
highestBitIndex5   6.7 ms (intrinsic bsr)
highestBitIndex6   4.7 ms (gcc lzcnt)
highestBitIndex7   7.1 ms (inline assembly lzcnt)
highestBitIndex8  10.2 ms (de Bruijn lookup Table, 32 entries)

I would personally go for highestBitIndex8 if portability is your focus, else gcc built-in is nice.

0 讨论(0)

查看其它11个回答