Is there memset() that accepts integers larger than char?

前端 未结 8 593
时光说笑
时光说笑 2020-12-05 00:25

Is there a version of memset() which sets a value that is larger than 1 byte (char)? For example, let\'s say we have a memset32() function, so using it we can do the followi

相关标签:
8条回答
  • 2020-12-05 00:33

    Just for the record, the following uses memcpy(..) in the following pattern. Suppose we want to fill an array with 20 integers:

    --------------------
    
    First copy one:
    N-------------------
    
    Then copy it to the neighbour:
    NN------------------
    
    Then copy them to make four:
    NNNN----------------
    
    And so on:
    NNNNNNNN------------
    
    NNNNNNNNNNNNNNNN----
    
    Then copy enough to fill the array:
    NNNNNNNNNNNNNNNNNNNN
    

    This takes O(lg(num)) applications of memcpy(..).

    int *memset_int(int *ptr, int value, size_t num) {
        if (num < 1) return ptr;
        memcpy(ptr, &value, sizeof(int));
        size_t start = 1, step = 1;
        for ( ; start + step <= num; start += step, step *= 2)
            memcpy(ptr + start, ptr, sizeof(int) * step);
    
        if (start < num)
            memcpy(ptr + start, ptr, sizeof(int) * (num - start));
        return ptr;
    }
    

    I thought it might be faster than a loop if memcpy(..) was optimised using some hardware block memory copy functionality, but it turns out that a simple loop is faster than the above with -O2 and -O3. (At least using MinGW GCC on Windows with my particular hardware.) Without the -O switch, on a 400 MB array the code above is about twice as fast as an equivalent loop, and takes 417 ms on my machine, while with optimisation they both go to about 300 ms. Which means that it takes approximately the same number of nanoseconds as bytes, and a clock cycle is about a nanosecond. So either there is no hardware block memory copy functionality on my machine, or the memcpy(..) implementation does not take advantage of it.

    0 讨论(0)
  • 2020-12-05 00:35

    If you're just targeting an x86 compiler you could try something like (VC++ example):

    inline void memset32(void *buf, uint32_t n, int32_t c)
    {
      __asm {
      mov ecx, n
      mov eax, c
      mov edi, buf
      rep stosd
      }
    }
    

    Otherwise just make a simple loop and trust the optimizer to know what it's doing, just something like:

    for(uint32_t i = 0;i < n;i++)
    {
      ((int_32 *)buf)[i] = c;
    }
    

    If you make it complicated chances are it will end up slower than simpler to optimize code, not to mention harder to maintain.

    0 讨论(0)
  • 2020-12-05 00:40

    write your own; it's trivial even in asm.

    0 讨论(0)
  • 2020-12-05 00:41

    wmemset(3) is the wide (16-bit) version of memset. I think that's the closest you're going to get in C, without a loop.

    0 讨论(0)
  • 2020-12-05 00:44

    Check your OS documentation for a local version, then consider just using the loop.

    The compiler probably knows more about optimizing memory access on any particular architecture than you do, so let it do the work.

    Wrap it up as a library and compile it with all the speed improving optimizations the compiler allows.

    0 讨论(0)
  • 2020-12-05 00:51
    void memset64( void * dest, uint64_t value, uintptr_t size )
    {
      uintptr_t i;
      for( i = 0; i < (size & (~7)); i+=8 )
      {
        memcpy( ((char*)dest) + i, &value, 8 );
      }  
      for( ; i < size; i++ )
      {
        ((char*)dest)[i] = ((char*)&value)[i&7];
      }  
    }
    

    (Explanation, as requested in the comments: when you assign to a pointer, the compiler assumes that the pointer is aligned to the type's natural alignment; for uint64_t, that is 8 bytes. memcpy() makes no such assumption. On some hardware unaligned accesses are impossible, so assignment is not a suitable solution unless you know unaligned accesses work on the hardware with small or no penalty, or know that they will never occur, or both. The compiler will replace small memcpy()s and memset()s with more suitable code so it is not as horrible is it looks; but if you do know enough to guarantee assignment will always work and your profiler tells you it is faster, you can replace the memcpy with an assignment. The second for() loop is present in case the amount of memory to be filled is not a multiple of 64 bits. If you know it always will be, you can simply drop that loop.)

    0 讨论(0)
提交回复
热议问题