Fastest way to zero out a 2d array in C?

前端 未结 12 1381
后悔当初
后悔当初 2021-01-29 18:58

I want to repeatedly zero a large 2d array in C. This is what I do at the moment:

// Array of size n * m, where n may not equal m
for(j = 0; j < n; j++)
{
            


        
12条回答
  •  独厮守ぢ
    2021-01-29 19:12

    If you are really, really obsessed with speed (and not so much with portability) I think the absolute fastest way to do this would be to use SIMD vector intrinsics. e.g. on Intel CPUs, you could use these SSE2 instructions:

    __m128i _mm_setzero_si128 ();                   // Create a quadword with a value of 0.
    void _mm_storeu_si128 (__m128i *p, __m128i a);  // Write a quadword to the specified address.
    

    Each store instruction will set four 32-bit ints to zero in one hit.

    p must be 16-byte aligned, but this restriction is also good for speed because it will help the cache. The other restriction is that p must point to an allocation size that is a multiple of 16-bytes, but this is cool too because it allows us to unroll the loop easily.

    Have this in a loop, and unroll the loop a few times, and you will have a crazy fast initialiser:

    // Assumes int is 32-bits.
    const int mr = roundUpToNearestMultiple(m, 4);      // This isn't the optimal modification of m and n, but done this way here for clarity.    
    const int nr = roundUpToNearestMultiple(n, 4);    
    
    int i = 0;
    int array[mr][nr] __attribute__ ((aligned (16)));   // GCC directive.
    __m128i* px = (__m128i*)array;
    const int incr = s >> 2;                            // Unroll it 4 times.
    const __m128i zero128 = _mm_setzero_si128();
    
    for(i = 0; i < s; i += incr)
    {
        _mm_storeu_si128(px++, zero128);
        _mm_storeu_si128(px++, zero128);
        _mm_storeu_si128(px++, zero128);
        _mm_storeu_si128(px++, zero128);
    }
    

    There is also a variant of _mm_storeu that bypasses the cache (i.e. zeroing the array won't pollute the cache) which could give you some secondary performance benefits in some circumstances.

    See here for SSE2 reference: http://msdn.microsoft.com/en-us/library/kcwz153a(v=vs.80).aspx

提交回复
热议问题