One of the stated reasons for knowing assembler is that, on occasion, it can be employed to write code that will be more performant than writing that code in a higher-level
Matrix operations using SIMD instructions is probably faster than compiler generated code.