GCC SSE code optimization
问题 This post is closely related to another one I posted some days ago. This time, I wrote a simple code that just adds a pair of arrays of elements, multiplies the result by the values in another array and stores it in a forth array, all variables floating point double precision typed. I made two versions of that code: one with SSE instructions, using calls to and another one without them I then compiled them with gcc and -O0 optimization level. I write them below: // SSE VERSION #define N 10000