I\'m doing some x64 assembly with Visual C++ 2010 and masm (\'fast call\' calling convention).
So let\'s say I have a function in C++:
extern \"C\" v
You can multiply by 0x0101010101010101 to copy the lowest byte into all other bytes (assuming the rest were all zero to begin with), it's slightly annoying because there is no imul r64, r64, imm64 but you can could do this:
mov rax, 0x0101010101010101
imul rax, rdx ; at least as fast as mul rdx on all CPUs
If rdx is not of the required form (in other words, if it has some extra bits set), just add a
movzx eax, dl in front, and move the constant into RDX or another register. (movzx edx,dl can't benefit from mov-elimination on Intel CPUs.)
If you don't like the code size (mov r64, imm64 is already 10 bytes by itself), just stick that constant in your data segment.