Interrupting an assembly instruction while it is operating

问题

When an interrupt comes to CPU, it is handled by saving current address location prior jumping into the handler if it is acknowledged. Otherwise it is ignored.

I wonder whether an assembly instruction call is interrupted.

For example,

mvi a, 03h ; put 3 value into acc. in 8080 assembly

Can be the one line instruction interrupted? Or if not, it is atomic??

Is there always a guarantee that "one line assembly instruction" is always atomic??

What if there is no "lock" keyword i.e. in 8080 assembly, then how is the atomicity provided?

For example, what if 64 bit sum is wanted to be operated, but there is no way to do it with "one line instruction" and an interrupt comes while operating on sum. How can it be prevented at assembly level??

The concept is being started to boil down for me.

回答1:

I'm not sure the 8080 was designed to be used in multi-CPU systems with shared RAM, which, however doesn't necessarily imply impossibility or nonexistence of such systems. The 8086 lock prefix is for such systems to ensure just one CPU can have exclusive access to memory while executing a sequence of memory read, value modification, memory write (RMW). The lock prefix isn't there to guard an instruction or a few instructions from being preempted by an interrupt handler.

You can be sure that individual instructions don't somehow get interrupted in mid-flight. Either they're let to run until completion or any of their side effects are reverted and they are restarted at a later time. That's a common implementation on most CPUs. Without it it would be hard to write well behaving code in presence of interrupts.

Indeed, you cannot perform a 64-bit addition with a single 8080 instruction, so, that operation can be preempted by the ISR.

If you don't want that preemption at all, you can guard your 64-bit add with interrupt disable and enable instructions (DI and EI).

If you want to let the ISR preempt the 64-bit but without disturbing the registers that the 64-bit add uses, the ISR must save and restore those registers by e.g. using the PUSH and POP instructions.

Find a 8080 manual for detailed description of interrupt handling (e.g. here).

回答2:

Yes all "normal" ISAs including 8080 and x86 guarantee that instructions are atomic with respect to interrupts on the same core. Either an instruction has fully executed and all its architectural effects are visible (in the interrupt handler), or none of them are. Any deviations from this rule are generally carefully documented.

For example, Intel's x86 manual vol.3 (~1000 page PDF) does make a point of specifically saying this:

6.6 PROGRAM OR TASK RESTART
To allow the restarting of program or task following the handling of an exception or an interrupt, all exceptions (except aborts) are guaranteed to report exceptions on an instruction boundary. All interrupts are guaranteed to be taken on an instruction boundary.

An old paragraph in Intel's vol.1 manual talks about single-core systems using cmpxchg without a lock prefix to read-modify-write atomically (with respect to other software, not hardware DMA access).

The CMPXCHG instruction is commonly used for testing and modifying semaphores. It checks to see if a semaphore is free. If the semaphore is free, it is marked allocated; otherwise it gets the ID of the current owner. This is all done in one uninterruptible operation. [because it's a single instruction] In a single-processor system, the CMPXCHG instruction eliminates the need to switch to protection level 0 (to disable interrupts) before executing multiple instructions to test and modify a semaphore.

For multiple processor systems, CMPXCHG can be combined with the LOCK prefix to perform the compare and exchange operation atomically. (See “Locked Atomic Operations” in Chapter 8, “Multiple-Processor Management,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information on atomic operations.)

(For more about the lock prefix and how it's implemented vs. non-locked add [mem], 1, see Can num++ be atomic for 'int num'?)

As Intel points out in that first paragraph, one way to achieve multi-instruction atomicity is to disable interrupts, then re-enable when you're done. This is better than using a mutex to protect a larger integer, especially if you're talking about data shared between the main program and an interrupt handler. If an interrupt happens while the main program holds the lock, it can't wait for the the lock to be release; that would never happen.

Disabling interrupts is usually pretty cheap on simple in-order pipelines, or especially microcontrollers. (Sometimes you need to save the previous interrupt state, instead of unconditionally enabling interrupts. e.g. a function that might be called with interrupts already disabled.)

Anyway, disabling interrupts is how you could atomically do something with a 64-bit integer on 8080.

A few long-running instructions are interruptible, according to rules documented for that instruction.

e.g. x86's rep-string instructions, like rep movsb (single-instruction memcpy of arbitrary size) are architecturally equivalent to repeating the base instruction (movsb) RCX times, decrementing RCX each time and incrementing or decrementing the pointer inputs (RSI and RDI). An interrupt arriving during a copy can set RCX "starting_value - byte_copied, so on resuming after the interrupt therep movsb` will run again and do the rest of the copy.

Other x86 examples include SIMD gather loads (AVX2/AVX512) and scatter stores (AVX512). e.g. vpgatherdd ymm0, [rdi + ymm1*4], ymm2 does up to 8 32-bit loads, according to which elements of ymm2 are set. And the results are merged into ymm0.

In the normal case (no interrupts, no page faults or other synchronous exceptions during the gather), you get the data in the destination register, and the mask register ends up zeroed. The mask register thus gives the CPU somewhere to store progress.

Gather and scatter are slow, and might need to trigger multiple page faults, so for synchronous exceptions this guarantees forward progress even under pathological conditions where handling a page fault unmaps all other pages. But more relevantly, it means avoiding redoing TLB misses if a middle element page faults, and not discarding work if an async interrupt arrives.

Some other long-running instructions (like wbinvd which flushes all data caches across all cores) are not architecturally interruptible, or even microarchitecturally abortable (to discard partial work and go handle an interrupt). It's privileged so user-space can't execute it as a denial-of-service attack causing high interrupt latency.

related example of documenting funny behaviour is when x86 popad goes off the top of the stack (segment limit). This is for an exception (not an external interrupt), documented earlier in the vol.3 manual, in section 6.5 EXCEPTION CLASSIFICATIONS (i.e. fault / trap / abort, see the PDF for more details.)

NOTE
One exception subset normally reported as a fault is not restartable. Such exceptions result in loss of some processor state. For example, executing a POPAD instruction where the stack frame crosses over the end of the stack segment causes a fault to be reported. In this situation, the exception handler sees that the instruction pointer (CS:EIP) has been restored as if the POPAD instruction had not been executed. However, internal processor state (the general-purpose registers) will have been modified. Such cases are considered programming errors. An application causing this class of exceptions should be terminated by the operating system.

Note that this is only if popad itself causes an exception, not for any other reason. An external interrupt can't split popad the way it can for rep movsb or vpgatherdd

(I guess for the purposes of popad faulting, it effectively works iteratively, popping 1 register at a time and logically modifying RSP/ESP/SP as well as the target register. Instead of checking the whole region it's going to load for segment limit before starting, because that would require an extra add, I guess.)

Out-of-order CPUs roll back to the retirement state on interrupts.

CPUs like modern x86 with out-of-order execution and splitting complex instructions into multiple uops still ensure this is the case. When an interrupt arrives, the CPU has to pick a point between two instructions it's in the middle of running as the location where the interrupt architecturally happens. It has to discard any work that's already done on decoding or starting to execute any later instructions. Assuming the interrupt returns, they'll be re-fetched and start over again executing.

See When an interrupt occurs, what happens to instructions in the pipeline?.

As Andy Glew says, current CPUs don't rename the privilege level, so what logically has to happen (discarding later instructions from the pipeline) matches what actually happens. Fun fact, though: x86 interrupts aren't fully serializing. That means they don't have to flush the store buffer as well as drain the out-of-order execution ReOrder Buffer. They do roll the OoO backend back to the retirement state (because it doesn't rename the privilege level, so it can't safely handle having user and kernel uops in flight at once), but the interrupt handler can potentially start executing before the store buffer has committed all pending stores to L1d.

(Store data from retired store instructions is not speculative; it definitely will happen, and the CPU has already dropped the state it would need to be able to roll back to before that store instruction. So a large store buffer full of scattered cache-miss stores can hurt interrupt latency.)

来源：https://stackoverflow.com/questions/55180452/interrupting-an-assembly-instruction-while-it-is-operating

标签

assembly

interrupt

cpu-architecture

atomicity

interrupted-exception