问题
is memset more efficient than for loop. so if i have
char x[500];
memset(x,0,sizeof(x));
or
char x[500];
for(int i = 0 ; i < 500 ; i ++) x[i] = 0;
which one is more efficient and why? is there any special instruction in hardware to do block level initialization.
回答1:
Most certainly, memset
will be much faster than that loop. Note how you treat one character at a time, but those functions are so optimized that set several bytes at a time, even using, when available, MMX and SSE instructions.
I think the paradigmatic example of these optimizations, that go unnoticed usually, is the GNU C library strlen
function. One would think that it has at least O(n) performance, but it actually has O(n/4) or O(n/8) depending on the architecture (yes, I know, in big O() will be the same, but you actually get an eighth of the time). How? Tricky, but nicely: strlen.
回答2:
Well, why don't we take a look at the generated assembly code, full optimization under VS 2010.
char x[500];
char y[500];
int i;
memset(x, 0, sizeof(x) );
003A1014 push 1F4h
003A1019 lea eax,[ebp-1F8h]
003A101F push 0
003A1021 push eax
003A1022 call memset (3A1844h)
And your loop...
char x[500];
char y[500];
int i;
for( i = 0; i < 500; ++i )
{
x[i] = 0;
00E81014 push 1F4h
00E81019 lea eax,[ebp-1F8h]
00E8101F push 0
00E81021 push eax
00E81022 call memset (0E81844h)
/* note that this is *replacing* the loop,
not being called once for each iteration. */
}
So, under this compiler, the generated code is exactly the same. memset
is fast, and the compiler is smart enough to know that you are doing the same thing as calling memset
once anyway, so it does it for you.
If the compiler actually left the loop as-is then it would likely be slower as you can set more than one byte size block at a time (i.e., you could unroll your loop a bit at a minimum. You can assume that memset
will be at least as fast as a naive implementation such as the loop. Try it under a debug build and you will notice that the loop is not replaced.
That said, it depends on what the compiler does for you. Looking at the disassembly is always a good way to know exactly what is going on.
回答3:
It really depends on the compiler and library. For older compilers or simple compilers, memset may be implemented in a library and would not perform better than a custom loop.
For nearly all compilers that are worth using, memset is an intrinsic function and the compiler will generate optimized, inline code for it.
Others have suggested profiling and comparing, but I wouldn't bother. Just use memset. Code is simple and easy to understand. Don't worry about it until your benchmarks tell you this part of code is a performance hotspot.
回答4:
The answer is 'it depends'. memset
MAY be more efficient, or it may internally use a for loop. I can't think of a case where memset
will be less efficient. In this case, it may turn into a more efficient for loop: your loop iterates 500 times setting a bytes worth of the array to 0 every time. On a 64 bit machine, you could loop through, setting 8 bytes (a long long) at a time, which would be almost 8 times quicker, and just dealing with the remaining 4 bytes (500%8) at the end.
EDIT:
in fact, this is what memset
does in glibc:
http://repo.or.cz/w/glibc.git/blob/HEAD:/string/memset.c
As Michael pointed out, in certain cases (where the array length is known at compile time), the C compiler can inline memset
, getting rid of the overhead of the function call. Glibc also has assembly optimized versions of memset
for most major platforms, like amd64:
http://repo.or.cz/w/glibc.git/blob/HEAD:/sysdeps/x86_64/memset.S
回答5:
Good compilers will recognize the for loop and replace it with either an optimal inline sequence or a call to memset. They will also replace memset with an optimal inline sequence when the buffer size is small.
In practice, with an optimizing compiler the generated code (and therefore performance) will be identical.
回答6:
Agree with above. It depends. But, for sure memset is faster or equal to the for-loop. If you are uncertain of your environment or too lazy to test, take the safe route and go with memset.
回答7:
void fill_array(void* array, size_t size_of_item, size_t length, void* value) {
uint8_t* bytes = value;
uint8_t first_byte = bytes[0];
if (size_of_item == 1) {
memset(array, first_byte, length);
return;
}
// size_of_item > 1 here.
bool all_bytes_are_identical = true;
for (size_t byte_index = 1; byte_index < size_of_item; byte_index++) {
if (bytes[byte_index] != first_byte) {
all_bytes_are_identical = false;
break;
}
}
if (all_bytes_are_identical) {
memset(array, first_byte, size_of_item * length);
return;
}
for (size_t index = 0; index < length; index++) {
memcpy((uint8_t*)array + size_of_item * index, value, size_of_item);
}
}
memset
is more efficient, it shouldn't care about non symmetric values (where all_bytes_are_identical
is false
). So you will search how to wrap it.
This is my variant. It is working for both little and big endian systems.
来源:https://stackoverflow.com/questions/7367677/is-memset-more-efficient-than-for-loop-in-c