find the index of the highest bit set of a 32-bit number without loops obviously

前端 未结 11 1420
梦毁少年i
梦毁少年i 2021-01-07 01:56

Here\'s a tough one(atleast i had a hard time :P):

find the index of the highest bit set of a 32-bit number without using any loops.

11条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-01-07 02:24

    Very interesting question, I will provide you an answer with benchmark


    Solution using a loop

    uint8_t highestBitIndex( uint32_t n )
    {
        uint8_t r = 0;
        while ( n >>= 1 )
            r++;
        return r;
    }
    

    This help to better understand the question but is highly inefficient.


    Solution using log

    This approach can also be summarize by the log method

    uint8_t highestSetBitIndex2(uint32_t n) {
        return (uint8_t)(log(n) / log(2));
    }
    

    However it is also inefficient (even more than above one, see benchmark)


    Solution using built-in instruction

    uint8_t highestBitIndex3( uint32_t n )
    {
        return 31 - __builtin_clz(n);
    }
    

    This solution, while very efficient, suffer from the fact that it only work with specific compilers (gcc and clang will do) and on specific platforms.

    NB: It is 31 and not 32 if we want the index


    Solution with intrinsic

    #include  
    
    uint8_t highestSetBitIndex5(uint32_t n)
    {
        return _bit_scan_reverse(n); // undefined behavior if n == 0
    }
    

    This will call the bsr instruction at assembly level


    Solution using inline assembly

    LZCNT and BSR can be summarize in assembly with the below functions:

    uint8_t highestSetBitIndex4(uint32_t n) // undefined behavior if n == 0
    {
        __asm__ __volatile__ (R"(
            .intel_syntax noprefix
                bsr eax, edi
            .att_syntax noprefix
            )"
                );
    }
    
    uint8_t highestSetBitIndex7(uint32_t n) // undefined behavior if n == 0
    {
        __asm__ __volatile__ (R"(.intel_syntax noprefix
            lzcnt ecx, edi
            mov eax, 31
            sub eax, ecx
            .att_syntax noprefix
        )");
    }
    

    NB: Do Not Use unless you know what you are doing


    Solution using lookup table and magic number multiplication (probably the best AFAIK)

    First you use the following function to clear all the bits except the highest one:

    uint32_t keepHighestBit( uint32_t n )
    {
        n |= (n >>  1);
        n |= (n >>  2);
        n |= (n >>  4);
        n |= (n >>  8);
        n |= (n >> 16);
        return n - (n >> 1);
    }
    

    Credit: The idea come from Henry S. Warren, Jr. in his book Hacker's Delight

    Then we use an algorithm based on DeBruijn's Sequence to perform a kind of binary search:

    uint8_t highestBitIndex8( uint32_t b )
    {
        static const uint32_t deBruijnMagic = 0x06EB14F9; // equivalent to 0b111(0xff ^ 3)
        static const uint8_t deBruijnTable[64] = {
             0,  0,  0,  1,  0, 16,  2,  0, 29,  0, 17,  0,  0,  3,  0, 22,
            30,  0,  0, 20, 18,  0, 11,  0, 13,  0,  0,  4,  0,  7,  0, 23,
            31,  0, 15,  0, 28,  0,  0, 21,  0, 19,  0, 10, 12,  0,  6,  0,
             0, 14, 27,  0,  0,  9,  0,  5,  0, 26,  8,  0, 25,  0, 24,  0,
         };
        return deBruijnTable[(keepHighestBit(b) * deBruijnMagic) >> 26];
    }
    

    Another version:

    void propagateBits(uint32_t *n) {
        *n |= *n >> 1;
        *n |= *n >> 2;
        *n |= *n >> 4;
        *n |= *n >> 8;
        *n |= *n >> 16;
    }
    
    uint8_t highestSetBitIndex8(uint32_t b)
    {
      static const uint32_t Magic = (uint32_t) 0x07C4ACDD;
    
      static const int BitTable[32] = {
         0,  9,  1, 10, 13, 21,  2, 29,
        11, 14, 16, 18, 22, 25,  3, 30,
         8, 12, 20, 28, 15, 17, 24,  7,
        19, 27, 23,  6, 26,  5,  4, 31,
      };
      propagateBits(&b);
    
      return BitTable[(b * Magic) >> 27];
    }
    

    Benchmark with 100 million calls

    compiling with g++ -std=c++17 highestSetBit.cpp -O3 && ./a.out

    highestBitIndex1  136.8 ms (loop)  
    highestBitIndex2  183.8 ms (log(n) / log(2)) 
    highestBitIndex3   10.6 ms (de Bruijn lookup Table with power of two, 64 entries)
    highestBitIndex4   4.5 ms (inline assembly bsr)
    highestBitIndex5   6.7 ms (intrinsic bsr)
    highestBitIndex6   4.7 ms (gcc lzcnt)
    highestBitIndex7   7.1 ms (inline assembly lzcnt)
    highestBitIndex8  10.2 ms (de Bruijn lookup Table, 32 entries)
    

    I would personally go for highestBitIndex8 if portability is your focus, else gcc built-in is nice.

提交回复
热议问题