How would you transpose a binary matrix?

前端未结

关注

 7  985

-上瘾入骨i 2020-12-16 02:13

I have binary matrices in C++ that I repesent with a vector of 8-bit values.

For example, the following matrix:

1 0 1 0 1 0 1
0 1 1 0 0 1 1
0 0 0 1 1


      
      
        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   时光取名叫无心
                                             
                
                
                (楼主)
            
              
              
                2020-12-16 02:58
              

            
            
                        
Here's what I posted on gitub (mischasan/sse2/ssebmx.src)
Changing INP() and OUT() to use induction vars saves an IMUL each.
AVX256 does it twice as fast.
AVX512 is not an option, because there is no _mm512_movemask_epi8().

#include 
#include 

#define INP(x,y) inp[(x)*ncols/8 + (y)/8]
#define OUT(x,y) out[(y)*nrows/8 + (x)/8]

void ssebmx(char const *inp, char *out, int nrows, int ncols)
{
    int rr, cc, i, h;
    union { __m128i x; uint8_t b[16]; } tmp;

    // Do the main body in [16 x 8] blocks:
    for (rr = 0; rr <= nrows - 16; rr += 16)
        for (cc = 0; cc < ncols; cc += 8) {
            for (i = 0; i < 16; ++i)
                tmp.b[i] = INP(rr + i, cc);
            for (i = 8; i--; tmp.x = _mm_slli_epi64(tmp.x, 1))
                *(uint16_t*)&OUT(rr, cc + i) = _mm_movemask_epi8(tmp.x);
        }

    if (rr == nrows) return;

    // The remainder is a row of [8 x 16]* [8 x 8]?

    //  Do the [8 x 16] blocks:
    for (cc = 0; cc <= ncols - 16; cc += 16) {
        for (i = 8; i--;)
            tmp.b[i] = h = *(uint16_t const*)&INP(rr + i, cc),
            tmp.b[i + 8] = h >> 8;
        for (i = 8; i--; tmp.x = _mm_slli_epi64(tmp.x, 1))
            OUT(rr, cc + i) = h = _mm_movemask_epi8(tmp.x),
            OUT(rr, cc + i + 8) = h >> 8;
    }

    if (cc == ncols) return;

    //  Do the remaining [8 x 8] block:
    for (i = 8; i--;)
        tmp.b[i] = INP(rr + i, cc);
    for (i = 8; i--; tmp.x = _mm_slli_epi64(tmp.x, 1))
        OUT(rr, cc + i) = _mm_movemask_epi8(tmp.x);
}


HTH.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它7个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复