It is probably an data alignment issue. _mm256_load_ps requires 256-bit (32-bytes) aligned memory. The default allocator for std::vector doesn't meet that requirement. You'll need to supply an aligned allocator or use another instruction with less stringent alignment requirement (such as _mm256_loadu_ps).