MITE (legacy pipeline) used instead of DSB (uops cache) when jump is not quite aligned on 32 bytes
问题 This question used to be a part of this (now updated) question, but it seems like it should be another question, since it didn't help to get an answer to the other one. My starting point is a loop doing 3 independent additions: for (unsigned long i = 0; i < 2000000000; i++) { asm volatile("" : "+r" (a), "+r" (b), "+r" (c), "+r" (d)); // prevents C compiler from optimizing out adds a = a + d; b = b + d; c = c + d; } When this loop is not unrolled, it executes in 1 cycle (which is to be