CUDA Device To Device transfer expensive

前端未结

关注

 2  1772

I have written some code to try to swap quadrants of a 2D matrix for FFT purposes, that is stored in a flat array.

    int leftover = W-dcW;

    T *temp;


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2020-12-18 08:47
              
            
            
                                                                       
I ended up writing a kernel to do the swaps. This was indeed faster than the Device to Device memcpy operations
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2020-12-18 08:51
              
            
            
                                                                       
Perhaps the following solution to perform the 2d fftshift in CUDA would be of interest:

#define IDX2R(i,j,N) (((i)*(N))+(j))

__global__ void fftshift_2D(double2 *data, int N1, int N2)
{
    int i = threadIdx.y + blockDim.y * blockIdx.y;
    int j = threadIdx.x + blockDim.x * blockIdx.x;

    if (i < N1 && j < N2) {
        double a = pow(-1.0, (i+j)&1);

        data[IDX2R(i,j,N2)].x *= a;
        data[IDX2R(i,j,N2)].y *= a;
    }
}


It consists in multiplying the matrix to be transformed by a chessboard of 1s and -1s which is equivalent to the multiplication by exp(-j*(n+m)*pi) and thus to shifts in both directions in the conjugate domain. 

You have to call this kernel before and after the application of the CUFFT.

One pro is that memory movements/swapping are avoided.

IMPROVEMENT IN SPEED

Following the suggestion received at the NVIDIA Forum, improved speed can be achieved as by changing the instruction

double a = pow(-1.0,(i+j)&1);


to

double a = 1-2*((i+j)&1);


to avoid the use of the slow routine pow.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复