I have the following function:
template
void SSE_vectormult(T * A, T * B, int size)
{
__m128d a;
__m128d b;
__m128d c;
do
Let me try and answer why your code works in Linux and not Windows. Code compiled in 64-bit mode has the stack aligned by 16 bytes. However, code compiled in 32-bit mode is only 4 byte aligned on windows and is not guaranteed to be 16 byte aligned on Linux.
GCC defaults to 64-bit mode on 64-bit systems. However MSVC defaults to 32-bit mode even on 64-bit systems. So I'm going to guess that you did not compile your code in 64-bit mode in windows and _mm_load_pd and _mm_store_pd both need 16 byte aligned addresses so the code crashes.
You have at least three different solutions to get your code working in Windows as well.
_mm_storeu_pd) The best solution is the third solution since then your code will work on 32 bit systems and on older systems where unaligned loads/stores are much slower.