Calculating matrix product is much slower with SSE than with straight-forward-algorithm

后端未结

关注

 2  1615

滥情空心 2021-01-03 05:40

I want to multiply two matrices, one time by using the straight-forward-algorithm:

template 
void multiplicate_straight(T ** A, T ** B, T *


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   夕颜
                                             
                
                
                (楼主)
            
              
              
                2021-01-03 06:16
              

            
            
                        
I believe this should do the same thing as the first loop with SSE, assuming sizeX is a multiple of two and the memory is 16-byte aligned.

You may gain a bit more performance by unrolling the loop and using multiple temp variables which you add together at the end. You could also try AVX and the new Fused Multiply Add instruction.

template 
void multiplicate_SSE2(T ** A, T ** B, T ** C, int sizeX)
{
    T ** D = AllocateDynamicArray2D(sizeX, sizeX);
    transpose_matrix(B, D,sizeX);
    for(int i = 0; i < sizeX; i++)
    {
        for(int j = 0; j < sizeX; j++)
        {
            __m128d temp = _mm_setzero_pd();
            for(int g = 0; g < sizeX; g += 2)
            {
                __m128d a = _mm_load_pd(&A[i][g]);
                __m128d b = _mm_load_pd(&D[j][g]);
                temp = _mm_add_pd(temp, _mm_mul_pd(a,b));
            }
            // Add top and bottom half of temp together
            temp = _mm_add_pd(temp, _mm_shuffle_pd(temp, temp, 1));
            _mm_store_sd(temp, &C[i][j]); // Store one value
        }
    }
    FreeDynamicArray2D(D);
}

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复