Is there a simple tutorial for me to get up to speed in SSE, SSE2 and SSE3 in GNU C++? How can you do code optimization in SSE?
Check out the -mtune and -march options, -msse*, and -mfpmath of course. All of those enable GCC to do SSE-specific optimizations.
Anything beyond that is the realm of Assembler, I am afraid.
GCC Online Manual - i386 and x86_64 Options