Fastest way to transpose 4x4 byte matrix

前端 未结 5 2000
猫巷女王i
猫巷女王i 2020-12-19 05:51

I have a 4x4 block of bytes that I\'d like to transpose using general purpose hardware. In other words, for bytes A-P, I\'m looking for the most efficient (in terms of numbe

5条回答
  •  抹茶落季
    2020-12-19 06:15

    An efficient solution is possible on a 64 bits machine, if you accept that. First shift the 32 bits integer constants by (0,) 1, 2 and 3 bytes respectively [3 shitfs]. Then mask out the unwanted bits and perform logical ORs [12 ANDs with a constant, 12 ORs]. Finally, shift back to 32 bits [3 shifts] and read out the 32 bits.

    ABCD
    EFGH
    IJKL
    MNOP
    
    ABCD
     EFGH
      IJKL
       MNOP
    
    A---
     E---
      I---
       MNOP
    =======
    AEIMNOP
    AEIM
    
    AB--
     -F--
      -J--
       -NOP
    =======
    ABFJNOP
    BFJN
    
    ABC-
     --G-
      --K-
       --OP
    =======
    ABCGKOP
    CGKO
    
    ABCD
     ---H
      ---L
       ---P
    =======
    ABCDHLP
    DHLP
    

提交回复
热议问题