So when you add an optimization flag when compiling your C++, it runs faster, but how does this work? Could someone explain what really goes on in the assembly?
Most of the optimization happens in the compiler's intermediate representation before the assembly is generated. You should definitely check out Agner Fog's Software optimization resources. Chapter 8 of the 1st manual describes optimizations performed by the compiler with examples.