I learned that memset(ptr, 0, nbytes) is really fast, but is there a faster way (at least on x86)?
I assume that memset uses mov, however w
If I remember correctly (from a couple of years ago), one of the senior developers was talking about a fast way to bzero() on PowerPC (specs said we needed to zero almost all the memory on power up). It might not translate well (if at all) to x86, but it could be worth exploring.
The idea was to load a data cache line, clear that data cache line, and then write the cleared data cache line back to memory.
For what it is worth, I hope it helps.