Compilation of a simple c++ program using SSE intrinsics

前端未结

关注

 3  469

北荒 2020-12-16 05:17

I am new to the SSE instructions and I was trying to learn them from this site: http://www.codeproject.com/Articles/4522/Introduction-to-SSE-Programming

I am using t

3条回答

北海茫月 (楼主)

2020-12-16 05:55

This doesn't directly answer your question but I want point out that your SSE code is incorrectly written, I would be surprised if it works. You need to use load/store operations on non-sse types that includes aligned non-sse types like your aligned float array (you need to do this even if you have a dynamic array of SSE type). You need to keep mind that when you're working with SSE the SSE data-types are suppose to represent data in the SSE registers and every thing else is usually in system memory or non-SSE registers and thus you need to load/store from/to register and memory. This how your function should look like:

void myssefunction
(
    float* pArray1,                   // [in] first source array
    float* pArray2,                   // [in] second source array
    float* pResult,                   // [out] result array
    int nSize                         // [in] size of all arrays
)                                   
{
    const __m128 m0_5 = _mm_set_ps1(0.5f);        // m0_5[0, 1, 2, 3] = 0.5
    for (size_t index = 0; index < nSize; index += 4)
    {
        __m128 pSrc1 = _mm_load_ps(pArray1 + index); // load 4 elements from memory into SSE register
        __m128 pSrc2 = _mm_load_ps(pArray2 + index); // load 4 elements from memory into SSE register

        __m128 m1   = _mm_mul_ps(pSrc1, pSrc1);        // m1 = *pSrc1 * *pSrc1
        __m128 m2   = _mm_mul_ps(pSrc2, pSrc2);        // m2 = *pSrc2 * *pSrc2
        __m128 m3   = _mm_add_ps(m1, m2);                // m3 = m1 + m2
        __m128 m4   = _mm_sqrt_ps(m3);                   // m4 = sqrt(m3)
        __m128 pDest  = _mm_add_ps(m4, m0_5);          // pDest = m4 + 0.5

        _mm_store_ps(pResult + index, pDest); // store 4 elements from SSE register to memory.
    }
}

Also worth noting that you have a limit of how many registers can be used in a given time (something like 16 for SSE2). You can write code that try to use more than the limit but this will cause register spilling.

0 讨论(0)

查看其它3个回答