i\'ve been writing in x86 assembly lately (for fun) and was wondering whether or not rep prefixed string instructions actually have a performance edge on modern processors o
There is a lot of space given to questions like this in both AMD and Intel's optimization guides. Validity of advice given in this area has a "half life" - different CPU generations behave differently, for example:
The Intel Architecture Optimization Manual gives performance comparison figures for various block copy techniques (including rep stosd
) on Table 7-2. Relative Performance of Memory Copy Routines, pg. 7-37f., for different CPUs, and again what's fastest on one might not be fastest on others.
For many cases, recent x86 CPUs (which have the "string" SSE4.2 operations) can do string operations via the SIMD unit, see this investigation.
To follow up on all this (and/or keep yourself updated when things change again, inevitably), read Agner Fog's Optimization guides/blogs.