Regarding instruction ordering in executions of cache-miss loads before cache-hit stores on x86
Given the small program shown below (handcrafted to look the same from a sequential consistency / TSO perspective), and assuming it's being run by a superscalar out-of-order x86 cpu: Load A <-- A in main memory Load B <-- B is in L2 Store C, 123 <-- C is L1 I have a few questions: Assuming a big enough instruction-window, will the three instructions be fetched, decoded, executed at the same time? I assume not, as that would break execution in program order. The 2nd load is going to take longer to fetch A from memory than B. Will the later have to wait until the first is fully executed? Will