Function crashes when using _mm_load_pd

岁酱吖の 提交于 2019-11-28 14:23:15
Paul R

Your data is not guaranteed to be 16 byte aligned as required by SSE loads. Either use _mm_loadu_pd:

    a = _mm_loadu_pd(A);
    ...
    a = _mm_loadu_pd(A2ptr);
    b = _mm_loadu_pd(B2ptr);

or make sure that your data is correctly aligned where possible, e.g. for static or locals:

alignas(16) double A2[2], B2[2], C[2];    // C++11, or C11 with <stdalign.h>

or without C++11, using compiler-specific language extensions:

 __attribute__ ((aligned(16))) double A2[2], B2[2], C[2];   // gcc/clang/ICC/et al

__declspec (align(16))         double A2[2], B2[2], C[2];   // MSVC

You could use #ifdef to #define an ALIGN(x) macro that works on the target compiler.

Let me try and answer why your code works in Linux and not Windows. Code compiled in 64-bit mode has the stack aligned by 16 bytes. However, code compiled in 32-bit mode is only 4 byte aligned on windows and is not guaranteed to be 16 byte aligned on Linux.

GCC defaults to 64-bit mode on 64-bit systems. However MSVC defaults to 32-bit mode even on 64-bit systems. So I'm going to guess that you did not compile your code in 64-bit mode in windows and _mm_load_pd and _mm_store_pd both need 16 byte aligned addresses so the code crashes.

You have at least three different solutions to get your code working in Windows as well.

  1. Compile your code in 64 bit mode.
  2. Use unaligned loads and stores (e.g. _mm_storeu_pd)
  3. Align the data yourself as Paul R suggested.

The best solution is the third solution since then your code will work on 32 bit systems and on older systems where unaligned loads/stores are much slower.

If you look at http://msdn.microsoft.com/en-us/library/cww3b12t(v=vs.90).aspx you can see that the function __mm_load_pd is defined as:

__m128d _mm_load_pd (double *p);

So, in your code A should be of type double, but A is of tipe T that is a template param. You should be sure that you are calling your SSE_vectormult function with the rights template params or just remove the template and use the double type instead,

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!