How would you transpose a binary matrix?

前端未结

关注

 7  1016

-上瘾入骨i 2020-12-16 02:13

I have binary matrices in C++ that I repesent with a vector of 8-bit values.

For example, the following matrix:

1 0 1 0 1 0 1
0 1 1 0 0 1 1
0 0 0 1 1


      
      
        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   情书的邮戳
                                             
                
                
                (楼主)
            
              
              
                2020-12-16 03:06
              

            
            
                        
I've spent more time looking for a solution, and I've found some good ones.

The SSE2 way

On a modern x86 CPU, transposing a binary matrix can be done very efficiently with SSE2 instructions. Using such instructions it is possible to process a 16×8 matrix.

This solution is inspired by this blog post by mischasan and is vastly superior to every suggestion I've got so far to this question.

The idea is simple:


#include 
Pack 16 uint8_t variables into an __m128i
Use _mm_movemask_epi8 to get the MSBs of each byte, producing an uint16_t
Use _mm_slli_epi64 to shift the 128-bit register by one
Repeat until you've got all 8 uint16_ts


A generic 32-bit solution

Unfortunately, I also need to make this work on ARM. After implementing the SSE2 version, it would be easy to just just find the NEON equivalents, but the Cortex-M CPU, (contrary to the Cortex-A) does not have SIMD capabilities, so NEON isn't too useful for me at the moment.

NOTE: Because the Cortex-M doesn't have native 64-bit arithmetics, I could not use the ideas in any answers that suggest to do it by treating a 8x8 block as an uint64_t. Most microcontrollers that have a Cortex-M CPU also don't have too much memory so I prefer to do all this without a lookup table.

After some thinking, the same algorithm can be implemented using plain 32-bit arithmetics and some clever coding. This way, I can work with 4×8 blocks at a time. It was suggested by a collegaue and the magic lies in the way 32-bit multiplication works: you can find a 32-bit number with which you can multiply and then the MSB of each byte gets next to each other in the upper 32 bits of the result.


Pack 4 uint8_ts in a 32-bit variable
Mask the 1st bit of each byte (using 0x80808080)
Multiply it with 0x02040810
Take the 4 LSBs of the upper 32 bits of the multiplication
Generally, you can mask the Nth bit in each byte (shift the mask right by N bits) and multiply with the magic number, shifted left by N bits. The advantage here is that if your compiler is smart enough to unroll the loop, both the mask and the 'magic number' become compile-time constants so shifting them does not incur any performance penalty whatsoever. There's some trouble with the last series of 4 bits, because then one LSB is lost, so in that case I needed to shift the input left by 8 bits and use the same method as the first series of 4-bits.


If you do this with two 4×8 blocks, then you can get an 8x8 block done and arrange the resulting bits so that everything goes into the right place.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它7个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复