Is it possible to cast floats directly to __m128 if they are 16 byte aligned?

前端 未结 5 1671
时光取名叫无心
时光取名叫无心 2020-12-06 05:26

Is it safe/possible/advisable to cast floats directly to __m128 if they are 16 byte aligned?

I noticed using _mm_load_ps and _mm_stor

相关标签:
5条回答
  • 2020-12-06 05:51

    There are several ways to put float values into SSE registers; the following intrinsics can be used:

    __m128 sseval;
    float a, b, c, d;
    
    sseval = _mm_set_ps(a, b, c, d);  // make vector from [ a, b, c, d ]
    sseval = _mm_setr_ps(a, b, c, d); // make vector from [ d, c, b, a ]
    sseval = _mm_load_ps(&a);         // ill-specified here - "a" not float[] ...
                                      // same as _mm_set_ps(a[0], a[1], a[2], a[3])
                                      // if you have an actual array
    
    sseval = _mm_set1_ps(a);          // make vector from [ a, a, a, a ]
    sseval = _mm_load1_ps(&a);        // load from &a, replicate - same as previous
    
    sseval = _mm_set_ss(a);           // make vector from [ a, 0, 0, 0 ]
    sseval = _mm_load_ss(&a);         // load from &a, zero others - same as prev
    

    The compiler will often create the same instructions no matter whether you state _mm_set_ss(val) or _mm_load_ss(&val) - try it and disassemble your code.

    It can, in some cases, be advantageous to write _mm_set_ss(*valptr) instead of _mm_load_ss(valptr) ... depends on (the structure of) your code.

    0 讨论(0)
  • 2020-12-06 05:56

    What makes you think that _mm_load_ps and _mm_store_ps "add a significant overhead" ? This is the normal way to load/store float data to/from SSE registers assuming source/destination is memory (and any other method eventually boils down to this anyway).

    0 讨论(0)
  • 2020-12-06 06:00

    Going by http://msdn.microsoft.com/en-us/library/ayeb3ayc.aspx, it's possible but not safe or recommended.

    You should not access the __m128 fields directly.


    And here's the reason why:

    http://social.msdn.microsoft.com/Forums/en-US/vclanguage/thread/766c8ddc-2e83-46f0-b5a1-31acbb6ac2c5/

    1. Casting float* to __m128 will not work. C++ compiler converts assignment to __m128 type to SSE instruction loading 4 float numbers to SSE register. Assuming that this casting is compiled, it doesn't create working code, because SEE loading instruction is not generated.

    __m128 variable is not actually variable or array. This is placeholder for SSE register, replaced by C++ compiler to SSE Assembly instruction. To understand this better, read Intel Assembly Programming Reference.

    0 讨论(0)
  • 2020-12-06 06:09

    The obvious issue I can see is that you're than aliasing (referring to a memory location by more than one pointer type), which can confuse the optimiser. Typical issues with aliasing is that since the optimiser doesn't observe that you're modifying a memory location through the original pointer, it considers it to be unchanged.

    Since you're obviously not using the optimiser to its full extent (or you'd be willing to rely on it to emit the correct SSE instructions) you'll probably be OK.

    The problem with using the intrinsics yourself is that they're designed to operate on SSE registers, and can't use the instruction variants that load from a memory location and process it in a single instruction.

    0 讨论(0)
  • 2020-12-06 06:12

    A few years have passed since the question was asked. To answer the question my experience shows:

    YES

    reinterpret_cast-casting a float* into a __m128* and vice versa is good as long as that float* is 16-byte-aligned - example (in MSVC 2012):

    __declspec( align( 16 ) ) float f[4];
    return _mm_mul_ps( _mm_set_ps1( 1.f ), *reinterpret_cast<__m128*>( f ) );
    
    0 讨论(0)
提交回复
热议问题