What is the fastest way to return the positions of all set bits in a 64-bit integer?

后端未结

关注

 10  1906

I need a fast way to get the position of all one bits in a 64-bit integer. For example, given x = 123703, I\'d like to fill an array idx[] = {0, 1, 2, 4,


                      
              相关标签:


      
      
        
          10条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2020-12-13 04:33
              
            
            
                                                                       
I believe the key to performance here is to focus on the larger problem rather than on micro-optimizing the extraction of bit positions out of a random integer.

Judging by your sample code and previous SO question you are enumerating all words with K bits set in order, and extracting the bit indices out of these. This greatly simplifies matters.

If so then instead of rebuilding the bit position each iteration try directly incrementing the positions in the bit array. Half of the time this will involve a single loop iteration and increment.

Something along these lines:

// Walk through all len-bit words with num-bits set in order
void enumerate(size_t num, size_t len) {
    size_t i;
    unsigned int bitpos[64 + 1];

    // Seed with the lowest word plus a sentinel
    for(i = 0; i < num; ++i)
        bitpos[i] = i;
    bitpos[i] = 0;

    // Here goes the main loop
    do {
        // Do something with the resulting data
        process(bitpos, num);

        // Increment the least-significant series of consecutive bits
        for(i = 0; bitpos[i + 1] == bitpos[i] + 1; ++i)
            bitpos[i] = i;
    // Stop on reaching the top
    } while(++bitpos[i] != len);
}

// Test function
void process(const unsigned int *bits, size_t num) {
    do
        printf("%d ", bits[--num]);
    while(num);
    putchar('\n');
}


Not particularly optimized but you get the general idea.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2020-12-13 04:40
              
            
            
                                                                       
Here's some tight code, written for 1-byte (8-bits), but it should easily, obviously expand to 64-bits.

int main(void)
{
    int x = 187;

    int ans[8] = {-1,-1,-1,-1,-1,-1,-1,-1};
    int idx = 0;

    while (x)
    {
        switch (x & ~(x-1))
        {
        case 0x01: ans[idx++] = 0; break;
        case 0x02: ans[idx++] = 1; break;
        case 0x04: ans[idx++] = 2; break;
        case 0x08: ans[idx++] = 3; break;
        case 0x10: ans[idx++] = 4; break;
        case 0x20: ans[idx++] = 5; break;
        case 0x40: ans[idx++] = 6; break;
        case 0x80: ans[idx++] = 7; break;
        }

        x &= x-1;
    }

   getchar();
   return 0;
}


Output array should be:

ans = {0,1,3,4,5,7,-1,-1};

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2020-12-13 04:44
              
            
            
                                                                       
As a minimal modification:

int64_t x;            
char idx[K+1];
char *dst=idx;
const int BITS = 8;
for (int i = 0 ; i < 64+BITS; i += BITS) {
  int y = (x & ((1<<BITS)-1));
  char* end = strcat(dst, tab[y]); // tab[y] is a _string_
  for (; dst != end; ++dst)
  {
    *dst += (i - 1); // tab[] is null-terminated so bit positions are 1 to BITS.
  }
  x >>= BITS;
}


The choice of BITS determines the size of the table. 8, 13 and 16 are logical choices. Each entry is a string, zero-terminated and containing bit positions with 1 offset. I.e. tab[5] is "\x03\x01". The inner loop fixes this offset.

Slightly more efficient: replace the strcat and inner loop by

char const* ptr = tab[y];
while (*ptr)
{
   *dst++ = *ptr++ + (i-1);
}


Loop unrolling can be a bit of a pain if the loop contains branches, because copying those branch statements doesn't help the branch predictor. I'll happily leave that decision to the compiler.

One thing I'm considering is that tab[y] is an array of pointers to strings. These are highly similar: "\x1" is a suffix of "\x3\x1". In fact, each string which doesn't start with "\x8" is a suffix of a string which does. I'm wondering how many unique strings you need, and to what degree tab[y] is in fact needed. E.g. by the logic above, tab[128+x] == tab[x]-1.

[edit]

Nevermind, you definitely need 128 tab entries starting with "\x8" since they're never the suffix of another string. Still, the tab[128+x] == tab[x]-1 rule means that you can save half the entries, but at the cost of two extra instructions: char const* ptr = tab[x & 0x7F] - ((x>>7) & 1). (Set up tab[] to point after the \x8)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  春和景丽        
                
              
                            
                2020-12-13 04:44
              
            
            
                                                                       
If I take "I need a fast way to get the position of all one bits in a 64-bit integer" literally...

I realise this is a few weeks old, however and out of curiosity, I remember way back in my assembly days with the CBM64 and Amiga using an arithmetic shift and then examining the carry flag - if it's set then the shifted bit was 1, if clear then it's zero 

e.g. for an arithmetic shift left (examining from bit 64 to bit 0)....

pseudo code (ignore instruction mix etc errors and oversimplification...been a while):

    move #64+1, counter
    loop. ASL 64bitinteger       
    BCS carryset
    decctr. dec counter
    bne loop
    exit

    carryset. 
    //store #counter-1 (i.e. bit position) in datastruct indexed by counter
    jmp decctr


...I hope you get the idea.

I've not used assembly since then but I'm wondering if we could use some C++ in-line assembly similar to the above to do something similar here. We could do the whole conversion in assembly (very few lines of code), building up an appropriate data structure. C++ could simply examine the answer. 

If this is possible then I'd imagine it to be pretty fast.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复