Fast memory transpose with SSE, AVX, and OpenMP

前端未结

关注

 3  374

清歌不尽 2020-12-30 08:22

I need a fast memory transpose algorithm for my Gaussian convolution function in C/C++. What I do now is

convolute_1D
transpose
convolute_1D
transpose


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   时光取名叫无心
                                             
                
                
                (楼主)
            
              
              
                2020-12-30 09:07
              

            
            
                        
I'd guess that your best bet would be to try and combine the convolution and the transpose - i.e. write out the results of the convolve transposed as you go. You're almost certainly memory bandwidth limited on the transpose so reducing the number of instructions used for the transpose isn't really going to help (hence the lack of improvement from using AVX). Reducing the number of passes over your data is going to give you the best performance improvements.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复