Function crashes when using _mm_load_pd

前端 未结 3 818
旧时难觅i
旧时难觅i 2020-12-12 04:34

I have the following function:

template 
void SSE_vectormult(T * A, T * B, int size)
{

    __m128d a;
    __m128d b;
    __m128d c;
    do         


        
3条回答
  •  旧时难觅i
    2020-12-12 05:19

    Let me try and answer why your code works in Linux and not Windows. Code compiled in 64-bit mode has the stack aligned by 16 bytes. However, code compiled in 32-bit mode is only 4 byte aligned on windows and is not guaranteed to be 16 byte aligned on Linux.

    GCC defaults to 64-bit mode on 64-bit systems. However MSVC defaults to 32-bit mode even on 64-bit systems. So I'm going to guess that you did not compile your code in 64-bit mode in windows and _mm_load_pd and _mm_store_pd both need 16 byte aligned addresses so the code crashes.

    You have at least three different solutions to get your code working in Windows as well.

    1. Compile your code in 64 bit mode.
    2. Use unaligned loads and stores (e.g. _mm_storeu_pd)
    3. Align the data yourself as Paul R suggested.

    The best solution is the third solution since then your code will work on 32 bit systems and on older systems where unaligned loads/stores are much slower.

提交回复
热议问题