Addressing a non-integer address, and sse

别等时光非礼了梦想. 提交于 2019-12-23 04:56:47

问题


I am trying to accelerate my code using sse, and the following code works well. Basically a __m128 variable should point to 4 floats in a row, in order to do 4 operations at once.

This code is equivalent to computing c[i]=a[i]+b[i] with i from 0 to 3.

float *data1,*data2,*data3
// ... code ... allocating data1-2-3 which are very long.
__m128* a = (__m128*) (data1);
__m128* b = (__m128*) (data2);
__m128* c = (__m128*) (data3);
*c = _mm_add_ps(*a, *b);

However, when I want to shift a bit the data that I use (see below), in order to compute c[i]=a[i+1]+b[i] with i from 0 to 3, it craches at execution time.

__m128* a = (__m128*) (data1+1); // <-- +1
__m128* b = (__m128*) (data2);
__m128* c = (__m128*) (data3);
*c = _mm_add_ps(*a, *b);

My guess is that it is related to the fact that __m128 is 128 bits and by float data are 32 bits. So, it may be impossible for a 128-bit pointer to point on an address that is not divisible by 128.

Anyway, do you know what the problem is and how I could go around it?


回答1:


Instead of using implicit aligned loads/stores like this:

__m128* a = (__m128*) (data1+1); // <-- +1
__m128* b = (__m128*) (data2);
__m128* c = (__m128*) (data3);
*c = _mm_add_ps(*a, *b);

use explicit aligned/unaligned loads/stores as appropriate, e.g.:

__m128 va = _mm_loadu_ps(data1+1); // <-- +1 (NB: use unaligned load)
__m128 vb = _mm_load_ps(data2);
__m128 vc = _mm_add_ps(va, vb);
_mm_store_ps(data3, vc);

Same amount of code (i.e. same number of instructions), but it won't crash, and you have explicit control over which loads/stores are aligned and which are unaligned.

Note that recent CPUs have relatively small penalties for unaligned loads, but on older CPUs there can be a 2x or greater hit.




回答2:


Your problem here is that a ends up pointing to something that is not a __m128; it points to something that contains the last 96 bits of an __m128 and 32 bits outside, which can be anything. It may be the first 32 bits of the next __m128, but eventually, when you arrive at the last __m128 in the same memory block, it will be something else. Maybe reserved memory that you cannot access, hence the crash.




回答3:


I'm not very familiar with sse , but i think you can make a local (or another copy) of the data that is properly alligned at 128 and contains 4 floats from data1 + 1 location.

Hope this helps, Razvan.



来源:https://stackoverflow.com/questions/19376042/addressing-a-non-integer-address-and-sse

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!