Is there memset() that accepts integers larger than char?

前端未结

关注

 8  593

Is there a version of memset() which sets a value that is larger than 1 byte (char)? For example, let\'s say we have a memset32() function, so using it we can do the followi

相关标签:

8条回答

余生分开走

2020-12-05 00:33
Just for the record, the following uses memcpy(..) in the following pattern. Suppose we want to fill an array with 20 integers:
```
--------------------

First copy one:
N-------------------

Then copy it to the neighbour:
NN------------------

Then copy them to make four:
NNNN----------------

And so on:
NNNNNNNN------------

NNNNNNNNNNNNNNNN----

Then copy enough to fill the array:
NNNNNNNNNNNNNNNNNNNN
```
This takes O(lg(num)) applications of memcpy(..).
```
int *memset_int(int *ptr, int value, size_t num) {
    if (num < 1) return ptr;
    memcpy(ptr, &value, sizeof(int));
    size_t start = 1, step = 1;
    for ( ; start + step <= num; start += step, step *= 2)
        memcpy(ptr + start, ptr, sizeof(int) * step);

    if (start < num)
        memcpy(ptr + start, ptr, sizeof(int) * (num - start));
    return ptr;
}
```
I thought it might be faster than a loop if memcpy(..) was optimised using some hardware block memory copy functionality, but it turns out that a simple loop is faster than the above with -O2 and -O3. (At least using MinGW GCC on Windows with my particular hardware.) Without the -O switch, on a 400 MB array the code above is about twice as fast as an equivalent loop, and takes 417 ms on my machine, while with optimisation they both go to about 300 ms. Which means that it takes approximately the same number of nanoseconds as bytes, and a clock cycle is about a nanosecond. So either there is no hardware block memory copy functionality on my machine, or the memcpy(..) implementation does not take advantage of it.
0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2020-12-05 00:35
If you're just targeting an x86 compiler you could try something like (VC++ example):
```
inline void memset32(void *buf, uint32_t n, int32_t c)
{
  __asm {
  mov ecx, n
  mov eax, c
  mov edi, buf
  rep stosd
  }
}
```
Otherwise just make a simple loop and trust the optimizer to know what it's doing, just something like:
```
for(uint32_t i = 0;i < n;i++)
{
  ((int_32 *)buf)[i] = c;
}
```
If you make it complicated chances are it will end up slower than simpler to optimize code, not to mention harder to maintain.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-12-05 00:40

write your own; it's trivial even in asm.

0 讨论(0)
发布评论:

提交评论
- 加载中...
眼角桃花

2020-12-05 00:41

wmemset(3) is the wide (16-bit) version of memset. I think that's the closest you're going to get in C, without a loop.

0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2020-12-05 00:44

Check your OS documentation for a local version, then consider just using the loop.

The compiler probably knows more about optimizing memory access on any particular architecture than you do, so let it do the work.

Wrap it up as a library and compile it with all the speed improving optimizations the compiler allows.

0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-12-05 00:51
```
void memset64( void * dest, uint64_t value, uintptr_t size )
{
  uintptr_t i;
  for( i = 0; i < (size & (~7)); i+=8 )
  {
    memcpy( ((char*)dest) + i, &value, 8 );
  }  
  for( ; i < size; i++ )
  {
    ((char*)dest)[i] = ((char*)&value)[i&7];
  }  
}
```
(Explanation, as requested in the comments: when you assign to a pointer, the compiler assumes that the pointer is aligned to the type's natural alignment; for uint64_t, that is 8 bytes. memcpy() makes no such assumption. On some hardware unaligned accesses are impossible, so assignment is not a suitable solution unless you know unaligned accesses work on the hardware with small or no penalty, or know that they will never occur, or both. The compiler will replace small memcpy()s and memset()s with more suitable code so it is not as horrible is it looks; but if you do know enough to guarantee assignment will always work and your profiler tells you it is faster, you can replace the memcpy with an assignment. The second for() loop is present in case the amount of memory to be filled is not a multiple of 64 bits. If you know it always will be, you can simply drop that loop.)
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页