Shift masked bits to the lsb

前端未结

关注

 2  712

小鲜肉 2020-12-16 14:14

When you and some data with a mask you get some result which is of the same size as the data/mask. What I want to do, is to take the masked bits in the result (

2条回答

不思量自难忘° (楼主)

2020-12-16 15:08

This operation is known as compress right. It is implemented as part of BMI2 as the PEXT instruction, in Intel processors as of Haswell.

Unfortunately, without hardware support is it a quite annoying operation. Of course there is an obvious solution, just moving the bits one by one in a loop, here is the one given by Hackers Delight:

unsigned compress(unsigned x, unsigned m) {
   unsigned r, s, b;    // Result, shift, mask bit. 

   r = 0; 
   s = 0; 
   do {
      b = m & 1; 
      r = r | ((x & b) << s); 
      s = s + b; 
      x = x >> 1; 
      m = m >> 1; 
   } while (m != 0); 
   return r; 
}

But there is an other way, also given by Hackers Delight, which does less looping (number of iteration logarithmic in the number of bits) but more per iteration:

unsigned compress(unsigned x, unsigned m) {
   unsigned mk, mp, mv, t; 
   int i; 

   x = x & m;           // Clear irrelevant bits. 
   mk = ~m << 1;        // We will count 0's to right. 

   for (i = 0; i < 5; i++) {
      mp = mk ^ (mk << 1);             // Parallel prefix. 
      mp = mp ^ (mp << 2); 
      mp = mp ^ (mp << 4); 
      mp = mp ^ (mp << 8); 
      mp = mp ^ (mp << 16); 
      mv = mp & m;                     // Bits to move. 
      m = m ^ mv | (mv >> (1 << i));   // Compress m. 
      t = x & mv; 
      x = x ^ t | (t >> (1 << i));     // Compress x. 
      mk = mk & ~mp; 
   } 
   return x; 
}

Notice that a lot of the values there depend only on m. Since you only have 512 different masks, you could precompute those and simplify the code to something like this (not tested)

unsigned compress(unsigned x, int maskindex) {
   unsigned t; 
   int i; 

   x = x & masks[maskindex][0];

   for (i = 0; i < 5; i++) {
      t = x & masks[maskindex][i + 1]; 
      x = x ^ t | (t >> (1 << i));
   } 
   return x; 
}

Of course all of these can be turned into "not a loop" by unrolling, the second and third ways are probably more suitable for that. That's a bit of cheat however.

0 讨论(0)

查看其它2个回答