SSE and C++ containers

百般思念 提交于 2019-11-28 10:04:43
Ben Voigt

The obvious thing that could have gone wrong would be if v wasn't aligned properly.

But it's allocated dynamically by vector, so it isn't subject to stack misalignment issues.

However, as phooji correctly points out, a "template" or "prototype" value is passed to the std::vector constructor which will be copied to all the elements of the vector. It's this parameter of std::vector::vector that will be placed on the stack and may be misaligned.

Some compilers have a pragma for controlling stack alignment within a function (basically, the compiler wastes some extra space as needed to get all locals properly aligned).

According to the Microsoft documentation, Visual C++ 2010 should set up 8 byte stack alignment automatically for SSE types and has done so since Visual C++ 2003

For gcc I don't know.


Under C++0x, for new point() to return unaligned storage is a serious non-compliance. [basic.stc.dynamic.allocation] says (wording from draft n3225):

The allocation function attempts to allocate the requested amount of storage. If it is successful, it shall return the address of the start of a block of storage whose length in bytes shall be at least as large as the requested size. There are no constraints on the contents of the allocated storage on return from the allocation function. The order, contiguity, and initial value of storage allocated by successive calls to an allocation function are unspecified. The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call to a corresponding deallocation function).

And [basic.align] says:

Additionally, a request for runtime allocation of dynamic storage for which the requested alignment cannot be honored shall be treated as an allocation failure.

Can you try a newer version of gcc where this might be fixed?

The vector constructor you are using is actually defined like this:

explicit vector ( size_type n, const T& value= T(), const Allocator& = Allocator() );

(see e.g., http://www.cplusplus.com/reference/stl/vector/vector/).

In other words, one element is default constructed (i.e., the default parameter value as you call the constructor), and the remaining elements are then created by copying the first one. My guess is that you need a copy constructor for point that properly handles the (non-)copying of __m128i values.

Update: When I try to build your code with Visual Studio 2010 (v. 10.0.30319.1), I get the following build error:

error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned c:\program files\microsoft visual studio 10.0\vc\include\vector 870 1   meh

This suggests Ben is right on the money regarding this being an alignment problem.

SSE intrinsics are required to be 16-byte aligned in memory. When you allocate an __m128 on the stack, there's no problem because the compiler automatically aligns these correctly. The default allocator for std::vector<>, which handles dynamic memory allocation, does not produce aligned allocations.

There is a possibility that the memory that is allocated by the default allocator in your compiler's STL implementation is not aligned. This will be dependent on the specific platform and compiler vendor.

Usually the default allocator uses operator new, which usually does not guarantee alignment beyond the word size (32-bit or 64-bit). To solve the problem, it may be necessary to implement a custom allocator which uses _aligned_malloc.

Also, a simple fix (although not a satisfactory one) would be to assign the value to a local __m128i variable, then copy that variable to the struct using unaligned instruction. Example:

struct point {
    __m128i v;
    point() {
        __m128i temp = _mm_setr_epi32(0, 0, 0, 0);
        _mm_storeu_si128(&v, temp);
    }
};
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!