Hamming weight ( number of 1 in a number) mixing C with assembly

后端未结

关注

 4  1398

花落未央 2020-12-21 11:39

I\'m trying to count how many number 1, are in the numbers of an array.

First I have a code in C lenguaje(work ok):

int popcount2(int* array, int le


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   感动是毒
                                             
                
                
                (楼主)
            
              
              
                2020-12-21 11:54
              

            
            
                        
When I needed to create a popcount, I ended up using the 5's and 3's method from the Bit Twiddling Hacks @PaulR mentioned.  But if I wanted to do this with a loop, maybe something like this:

#include 
#include 

int popcount2(int v) {
   int result = 0;
   int junk;

   asm (
        "shr $1, %[v]      \n\t"   // shift low bit into CF
        "jz done           \n"     // and skip the loop if that was the only set bit
     "start:               \n\t"
        "adc $0, %[result] \n\t"   // add CF (0 or 1) to result
        "shr $1, %[v]      \n\t"
        "jnz start         \n"     // leave the loop after shifting out the last bit
     "done:                \n\t"
        "adc $0, %[result] \n\t"   // and add that last bit

        : [result] "+r" (result), "=r" (junk)
        : [v] "1" (v)
        : "cc"
   );

   return result;
}

int main(int argc, char *argv[])
{
   for (int x=0; x < argc-1; x++)
   {
      int v = atoi(argv[x+1]);

      printf("%d %d\n", v, popcount2(v));
   }
}


adc is almost always more efficient than branching on CF.

"=r" (junk) is a dummy output operand that is in the same register as v (the "1" constraint).  We're using this to tell the compiler that the asm statement destroys the v input.  We could have used [v] "+r"(v) to get a read-write operand, but we don't want the C variable v to be updated.

Note that the loop trip-count for this implementation is the position of the highest set bit.  (bsr, or 32 - clz(v)).  @rcgldr's implementation which clears the lowest set bit every iteration will typically be faster when the number of set bits is low but they're not all near the bottom of the integer.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复