Intel\'s official optimization guide has a chapter on converting from MMX commands to SSE where they state the fallowing statment:
Computation instruc
Data that's aligned on a 16 byte boundary will have a memory address that's an even number — strictly speaking, a multiple of two. Each byte is 8 bits, so to align on a 16 byte boundary, you need to align to each set of two bytes.
Similarly, memory aligned on a 32 bit (4 byte) boundary would have a memory address that's a multiple of four, because you group four bytes together to form a 32 bit word.