hpc

GCC SSE code optimization

。_饼干妹妹 提交于 2019-11-27 05:29:43
问题 This post is closely related to another one I posted some days ago. This time, I wrote a simple code that just adds a pair of arrays of elements, multiplies the result by the values in another array and stores it in a forth array, all variables floating point double precision typed. I made two versions of that code: one with SSE instructions, using calls to and another one without them I then compiled them with gcc and -O0 optimization level. I write them below: // SSE VERSION #define N 10000