Many SSE \"mov\" instructions specify that they are moving floating-point values. For example:
I think I've found the answer: some microarchitectures execute floating-point instructions on different execution units than integer instructions. You get better overall latency when a stream of instructions stays within the same "domain" (integer or floating point). This is covered in pretty good detail in Agner Fog's optimization manual, in the section titled "Data Bypass Delays": http://www.agner.org/optimize/microarchitecture.pdf
I found this explanation in this similar SO question: Difference between MOVDQA and MOVAPS x86 instructions?