Fusing a triangle loop for parallelization, calculating sub-indices

前端未结

关注

 3  1748

A common technique in parallelization is to fuse nested for loops like this

for(int i=0; i

to


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  我在风中等你        
                
              
                            
                2020-12-10 07:12
              
            
            
                                                                       
Considering that you're trying to fuse a triangle with the intent of parallelizing, the non-obvious solution is to choose a non-trivial mapping of x to (i,j):

j |\ i ->
  | \             ____
| |  \    =>    |\\   |
V |___\         |_\\__|


After all, you're not processing them in any special order, so the exact mapping is a don't care.

So calculate x->i,j as you'd do for a rectangle, but if i > j then { i=N-i, j = N-j } (mirror Y axis, then mirror X axis).

   ____
 |\\   |      |\           |\
 |_\\__|  ==> |_\  __  =>  | \
                  / |      |  \
                 /__|      |___\

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-12-10 07:13
              
            
            
                                                                       
The most sane form is of course the first form.

That said, the fused form is better done with conditionals:

int i = 0; int j = 0;
for(int x=0; x<(n*(n+1)/2); x++) {
  // ...
  ++j;
  if (j>i)
  {
    j = 0;
    ++i;
  }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉酒成梦        
                
              
                            
                2020-12-10 07:14
              
            
            
                                                                       

  I'm wondering if there is a simpler or more efficient way of doing this?


Yes, the code you had to begin with. Please keep the following in mind:


There exists no case where floating point arithmetic is ever faster than plain integers. 
There does however exist plenty of cases where floating point is far slower than plain integers. FPU or no FPU.
Float variables are generally larger than plain integers on most systems and therefore slower for that reason alone.
The first version of the code is likely most friendly to the cache memory. As for any case of manual optimization, this depends entirely on what CPU you are using.
Division is generally slow on most systems, no matter if done to plain integers or floats.
Any form of complex arithmetic is going to be slower than simple counting.


So your second example is pretty much guaranteed to be far slower than the first example, for any given CPU in the world. In addition, it is also completely unreadable.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复