Shift masked bits to the lsb

前端 未结 2 710
小鲜肉
小鲜肉 2020-12-16 14:14

When you and some data with a mask you get some result which is of the same size as the data/mask. What I want to do, is to take the masked bits in the result (

相关标签:
2条回答
  • 2020-12-16 14:58

    You can use the pack-by-multiplication technique similar to the one described here. This way you don't need any loop and can mix the bits in any order.

    For example with the mask 0b10101001 == 0xA9 like above and 8-bit data abcdefgh (with a-h is the 8 bits) you can use the below expression to get 0000aceh

    uint8_t compress_maskA9(uint8_t x)
    {
        const uint8_t mask1 = 0xA9 & 0xF0;
        const uint8_t mask2 = 0xA9 & 0x0F;
        return (((x & mask1)*0x03000000 >> 28) & 0x0C) | ((x & mask2)*0x50000000 >> 30);
    }
    

    In this specific case there are some overlaps of the 4 bits while adding (which incur unexpected carry) during the multiplication step, so I've split them into 2 parts, the first one extracts bit a and c, then e and h will be extracted in the latter part. There are other ways to split the bits as well, like a & h then c & e. You can see the results compared to Harold's function live on ideone

    An alternate way with only one multiplication

    const uint32_t X = (x << 8) | x;
    return (X & 0x8821)*0x12050000 >> 28;
    

    I got this by duplicating the bits so that they're spaced out farther, leaving enough space to avoid the carry. This is often better than splitting into 2 multiplications


    If you want the result's bits reversed (i.e. heca0000) you can easily change the magic numbers accordingly

    // result: he00 | 00ca;
    return (((x & 0x09)*0x88000000 >> 28) & 0x0C) | (((x & 0xA0)*0x04800000) >> 30);
    

    or you can also extract the 3 bits e, c and a at the same time, leaving h separately (as I mentioned above, there are often multiple solutions) and you need only one multiplication

    return ((x & 0xA8)*0x12400000 >> 29) | (x & 0x01) << 3; // result: 0eca | h000
    

    But there might be a better alternative like the above second snippet

    const uint32_t X = (x << 8) | x;
    return (X & 0x2881)*0x80290000 >> 28
    

    Correctness check: http://ideone.com/PYUkty

    For a larger number of masks you can precompute the magic numbers correspond to those masks and store them in an array so that you can look them up immediately for use. I calculated those mask by hand but you can do that automatically


    Explanation

    We have abcdefgh & mask1 = a0c00000. Multiply it with magic1

        ........................a0c00000
     ×  00000011000000000000000000000000 (magic1 = 0x03000000)
        ────────────────────────────────
        a0c00000........................
     + a0c00000......................... (the leading "a" bit is outside int's range
        ────────────────────────────────  so it'll be truncated)
    r1 = acc.............................
    
    => (r1 >> 28) & 0x0C = 0000ac00
    

    Similarly we multiply abcdefgh & mask2 = 0000e00h with magic2

      ........................0000e00h
    × 01010000000000000000000000000000 (magic2 = 0x50000000)
      ────────────────────────────────
      e00h............................
    + 0h..............................
      ────────────────────────────────
    r2 = eh..............................
    
    => (r2 >> 30) = 000000eh
    

    Combine them together we have the expected result

    ((r1 >> 28) & 0x0C) | (r2 >> 30) = 0000aceh
    

    And here's the demo for the second snippet

                      abcdefghabcdefgh
    &                 1000100000100001 (0x8821)
      ────────────────────────────────
                      a000e00000c0000h
    × 00010010000001010000000000000000 (0x12050000)
      ────────────────────────────────
      000h
      00e00000c0000h
    + 0c0000h
      a000e00000c0000h
      ────────────────────────────────
    = acehe0h0c0c00h0h
    & 11110000000000000000000000000000
      ────────────────────────────────
    = aceh
    

    For the reversed order case:

                      abcdefghabcdefgh
    &                 0010100010000001 (0x2881)
      ────────────────────────────────
                      00c0e000a000000h
    x 10000000001010010000000000000000 (0x80290000)
      ────────────────────────────────
      000a000000h
      00c0e000a000000h
    + 0e000a000000h
      h
      ────────────────────────────────
      hecaea00a0h0h00h
    & 11110000000000000000000000000000
      ────────────────────────────────
    = heca
    

    Related:

    • How to create a byte out of 8 bool values (and vice versa)?
    • Redistribute least significant bits from a 4-byte array to a nibble
    0 讨论(0)
  • 2020-12-16 15:08

    This operation is known as compress right. It is implemented as part of BMI2 as the PEXT instruction, in Intel processors as of Haswell.

    Unfortunately, without hardware support is it a quite annoying operation. Of course there is an obvious solution, just moving the bits one by one in a loop, here is the one given by Hackers Delight:

    unsigned compress(unsigned x, unsigned m) {
       unsigned r, s, b;    // Result, shift, mask bit. 
    
       r = 0; 
       s = 0; 
       do {
          b = m & 1; 
          r = r | ((x & b) << s); 
          s = s + b; 
          x = x >> 1; 
          m = m >> 1; 
       } while (m != 0); 
       return r; 
    } 
    

    But there is an other way, also given by Hackers Delight, which does less looping (number of iteration logarithmic in the number of bits) but more per iteration:

    unsigned compress(unsigned x, unsigned m) {
       unsigned mk, mp, mv, t; 
       int i; 
    
       x = x & m;           // Clear irrelevant bits. 
       mk = ~m << 1;        // We will count 0's to right. 
    
       for (i = 0; i < 5; i++) {
          mp = mk ^ (mk << 1);             // Parallel prefix. 
          mp = mp ^ (mp << 2); 
          mp = mp ^ (mp << 4); 
          mp = mp ^ (mp << 8); 
          mp = mp ^ (mp << 16); 
          mv = mp & m;                     // Bits to move. 
          m = m ^ mv | (mv >> (1 << i));   // Compress m. 
          t = x & mv; 
          x = x ^ t | (t >> (1 << i));     // Compress x. 
          mk = mk & ~mp; 
       } 
       return x; 
    }
    

    Notice that a lot of the values there depend only on m. Since you only have 512 different masks, you could precompute those and simplify the code to something like this (not tested)

    unsigned compress(unsigned x, int maskindex) {
       unsigned t; 
       int i; 
    
       x = x & masks[maskindex][0];
    
       for (i = 0; i < 5; i++) {
          t = x & masks[maskindex][i + 1]; 
          x = x ^ t | (t >> (1 << i));
       } 
       return x; 
    }
    

    Of course all of these can be turned into "not a loop" by unrolling, the second and third ways are probably more suitable for that. That's a bit of cheat however.

    0 讨论(0)
提交回复
热议问题