Fastest way to zero out a 2d array in C?

前端未结

关注

 12  1381

后悔当初 2021-01-29 18:58

I want to repeatedly zero a large 2d array in C. This is what I do at the moment:

// Array of size n * m, where n may not equal m
for(j = 0; j < n; j++)
{


      
      
        
          12条回答        

        
                    
            
            
                         
                
              
              
                
                   独厮守ぢ
                                             
                
                
                (楼主)
            
              
              
                2021-01-29 19:12
              

            
            
                        
If you are really, really obsessed with speed (and not so much with portability) I think the absolute fastest way to do this would be to use SIMD vector intrinsics. e.g. on Intel CPUs, you could use these SSE2 instructions:

__m128i _mm_setzero_si128 ();                   // Create a quadword with a value of 0.
void _mm_storeu_si128 (__m128i *p, __m128i a);  // Write a quadword to the specified address.


Each store instruction will set four 32-bit ints to zero in one hit.

p must be 16-byte aligned, but this restriction is also good for speed because it will help the cache. The other restriction is that p must point to an allocation size that is a multiple of 16-bytes, but this is cool too because it allows us to unroll the loop easily.

Have this in a loop, and unroll the loop a few times, and you will have a crazy fast initialiser:

// Assumes int is 32-bits.
const int mr = roundUpToNearestMultiple(m, 4);      // This isn't the optimal modification of m and n, but done this way here for clarity.    
const int nr = roundUpToNearestMultiple(n, 4);    

int i = 0;
int array[mr][nr] __attribute__ ((aligned (16)));   // GCC directive.
__m128i* px = (__m128i*)array;
const int incr = s >> 2;                            // Unroll it 4 times.
const __m128i zero128 = _mm_setzero_si128();

for(i = 0; i < s; i += incr)
{
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
}


There is also a variant of _mm_storeu that bypasses the cache (i.e. zeroing the array won't pollute the cache) which could give you some secondary performance benefits in some circumstances. 

See here for SSE2 reference: http://msdn.microsoft.com/en-us/library/kcwz153a(v=vs.80).aspx
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它12个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复