What is the fastest way to return the positions of all set bits in a 64-bit integer?

后端 未结 10 1892
北荒
北荒 2020-12-13 04:05

I need a fast way to get the position of all one bits in a 64-bit integer. For example, given x = 123703, I\'d like to fill an array idx[] = {0, 1, 2, 4,

相关标签:
10条回答
  • 2020-12-13 04:33

    I believe the key to performance here is to focus on the larger problem rather than on micro-optimizing the extraction of bit positions out of a random integer.

    Judging by your sample code and previous SO question you are enumerating all words with K bits set in order, and extracting the bit indices out of these. This greatly simplifies matters.

    If so then instead of rebuilding the bit position each iteration try directly incrementing the positions in the bit array. Half of the time this will involve a single loop iteration and increment.

    Something along these lines:

    // Walk through all len-bit words with num-bits set in order
    void enumerate(size_t num, size_t len) {
        size_t i;
        unsigned int bitpos[64 + 1];
    
        // Seed with the lowest word plus a sentinel
        for(i = 0; i < num; ++i)
            bitpos[i] = i;
        bitpos[i] = 0;
    
        // Here goes the main loop
        do {
            // Do something with the resulting data
            process(bitpos, num);
    
            // Increment the least-significant series of consecutive bits
            for(i = 0; bitpos[i + 1] == bitpos[i] + 1; ++i)
                bitpos[i] = i;
        // Stop on reaching the top
        } while(++bitpos[i] != len);
    }
    
    // Test function
    void process(const unsigned int *bits, size_t num) {
        do
            printf("%d ", bits[--num]);
        while(num);
        putchar('\n');
    }
    

    Not particularly optimized but you get the general idea.

    0 讨论(0)
  • 2020-12-13 04:40

    Here's some tight code, written for 1-byte (8-bits), but it should easily, obviously expand to 64-bits.

    int main(void)
    {
        int x = 187;
    
        int ans[8] = {-1,-1,-1,-1,-1,-1,-1,-1};
        int idx = 0;
    
        while (x)
        {
            switch (x & ~(x-1))
            {
            case 0x01: ans[idx++] = 0; break;
            case 0x02: ans[idx++] = 1; break;
            case 0x04: ans[idx++] = 2; break;
            case 0x08: ans[idx++] = 3; break;
            case 0x10: ans[idx++] = 4; break;
            case 0x20: ans[idx++] = 5; break;
            case 0x40: ans[idx++] = 6; break;
            case 0x80: ans[idx++] = 7; break;
            }
    
            x &= x-1;
        }
    
       getchar();
       return 0;
    }
    

    Output array should be:

    ans = {0,1,3,4,5,7,-1,-1};
    
    0 讨论(0)
  • 2020-12-13 04:44

    As a minimal modification:

    int64_t x;            
    char idx[K+1];
    char *dst=idx;
    const int BITS = 8;
    for (int i = 0 ; i < 64+BITS; i += BITS) {
      int y = (x & ((1<<BITS)-1));
      char* end = strcat(dst, tab[y]); // tab[y] is a _string_
      for (; dst != end; ++dst)
      {
        *dst += (i - 1); // tab[] is null-terminated so bit positions are 1 to BITS.
      }
      x >>= BITS;
    }
    

    The choice of BITS determines the size of the table. 8, 13 and 16 are logical choices. Each entry is a string, zero-terminated and containing bit positions with 1 offset. I.e. tab[5] is "\x03\x01". The inner loop fixes this offset.

    Slightly more efficient: replace the strcat and inner loop by

    char const* ptr = tab[y];
    while (*ptr)
    {
       *dst++ = *ptr++ + (i-1);
    }
    

    Loop unrolling can be a bit of a pain if the loop contains branches, because copying those branch statements doesn't help the branch predictor. I'll happily leave that decision to the compiler.

    One thing I'm considering is that tab[y] is an array of pointers to strings. These are highly similar: "\x1" is a suffix of "\x3\x1". In fact, each string which doesn't start with "\x8" is a suffix of a string which does. I'm wondering how many unique strings you need, and to what degree tab[y] is in fact needed. E.g. by the logic above, tab[128+x] == tab[x]-1.

    [edit]

    Nevermind, you definitely need 128 tab entries starting with "\x8" since they're never the suffix of another string. Still, the tab[128+x] == tab[x]-1 rule means that you can save half the entries, but at the cost of two extra instructions: char const* ptr = tab[x & 0x7F] - ((x>>7) & 1). (Set up tab[] to point after the \x8)

    0 讨论(0)
  • 2020-12-13 04:44

    If I take "I need a fast way to get the position of all one bits in a 64-bit integer" literally...

    I realise this is a few weeks old, however and out of curiosity, I remember way back in my assembly days with the CBM64 and Amiga using an arithmetic shift and then examining the carry flag - if it's set then the shifted bit was 1, if clear then it's zero

    e.g. for an arithmetic shift left (examining from bit 64 to bit 0)....

    pseudo code (ignore instruction mix etc errors and oversimplification...been a while):
    
        move #64+1, counter
        loop. ASL 64bitinteger       
        BCS carryset
        decctr. dec counter
        bne loop
        exit
    
        carryset. 
        //store #counter-1 (i.e. bit position) in datastruct indexed by counter
        jmp decctr
    

    ...I hope you get the idea.

    I've not used assembly since then but I'm wondering if we could use some C++ in-line assembly similar to the above to do something similar here. We could do the whole conversion in assembly (very few lines of code), building up an appropriate data structure. C++ could simply examine the answer.

    If this is possible then I'd imagine it to be pretty fast.

    0 讨论(0)
提交回复
热议问题