Does Program Counter hold current address or the address of the next instruction?

后端 未结 3 1170
抹茶落季
抹茶落季 2020-12-06 12:57

Being a beginner and self-learner, I am learning assembly and currently reading the chapter 3 of the book, The C Companion by Allen Hollub. I can\'t understand the descript

3条回答
  •  离开以前
    2020-12-06 13:14

    Those claims could be talking about two different points in time, during vs. after the execution of an instruction.

    What was in those [...] that you omitted? Did it talk about finishing execution of one instruction and starting to fetch the next instruction, after incrementing PC by 2 bytes / 1 instruction-word?

    Otherwise it's an error in the book, because those two claims (that PC points to the current instruction vs. the next instruction during execution of the current instruction) are incompatible.

    I fail to understand the difference between holding the current address and the address of the next instruction

    Consider these (x86) instructions in memory, using 2-byte instructions to match the ISA from your book (x86 instruction are variable length from 1 to 15 bytes, including optional / mandatory prefix bytes):

     a:  0x66 0x90     nop
     c:  0x66 0x90     nop
    

    Each instruction has its own address. I've indicated their starting addresses with hex digits (which could also be symbolic labels in assembler syntax, but this is intended to be a mockup of disassembler output, like objdump -d). The "address of an instruction" is the address of its first byte in memory, regardless of what the architectural PC would hold before/during/after executing it.

    While the first nop is executing, the address of the next instruction is c. The "current instruction" is the first nop, regardless of what value PC (logically) has while it executes.


    Most instructions don't actually read PC as a data input. Only relative jumps and PC-relative loads/stores need it. (And thus the compiler/assembler needs to know the rule for calculating relative displacements.)

    MIPS and RISC-V also/instead have aupc instructions that add a register or immediate to the program counter, and put the result in another register. So instead of a PC-relative addressing mode, they have a PC-relative add, to produce a pointer you can use as an addressing mode. But same difference, really.

    As long as there's a consistent rule for the logical value of PC during the execution of an instruction, it doesn't really matter what the exact rule is.

    • PC = start of current instruction (e.g. MIPS logically works this way, regardless of what internal implementations actually do).

      MIPS relative branches are relative to PC + 4 (i.e. relative to the next instruction so for this purpose it's just a quirk of how it's documented), but MIPS jumps replace the low 28 bits of PC, not of PC+4 (which potentially differs in its high bits). See also http://www.cim.mcgill.ca/~langer/273/13-datapath1.pdf which goes over the logical operation of instruction fetch / execute on MIPS.)

    • PC = start of next instruction (common, e.g. x86)

    • PC = start of 2 instructions later. (e.g. ARM)

      Why does the ARM PC register point to the instruction after the next one to be executed? TL:DR: an artifact of a 3-stage fetch-decode-execute pipeline front-end in early ARM designs. (32-bit ARM exposes the program counter as r15, one of the 16 "general purpose" registers, so you can actually jump with or pc, r0, #4 or something, as well as reading it in any instruction for PC-relative addressing).

    As @Ross says, only a simple non-pipelined CPU will have a single physical program-counter register. (How does branch prediction interact with the instruction pointer).

    But if any instruction raises an exception (faults), it usually needs to store either the address of the faulting instruction, or the address of the next instruction, somewhere. That depends on what kind of exception it is. A debug / single-step exception would store the address of the next instruction, so returning from the exception handler would step. A page-fault would store the address of the faulting instruction so the default action is to retry it.

    The exception-handling rules are going to be separate from the normal PC-during-execution rules, so the hardware has to remember instruction-lengths, or instruction-start address to be able to handle exceptions. It doesn't have to be efficient, because interrupts/exceptions are rare; it's ok for the CPU to take multiple cycles before it even jumps to the interrupt-handler. (The normal-operation case of PC-relative addressing modes, and call instructions, does have to be efficient.)


    Implications of a simple physical implementation with PC=current instruction

    Having a PC that holds the address of the current instruction is a valid design.

    For a superscalar pipelined design, especially with Out-of-Order execution, it makes no real difference. The pipeline needs to track the address (and length if variable) of each instruction as it goes through the pipeline, because it can fetch/decode/execute more than 1 per cycle. It fetches in large blocks, and decodes up to n instructions from that block. Some implementations might require fetch-blocks to be 16-byte aligned, for example. (See https://agner.org/optimize/ for details on how various x86 microarchitectures do it, and how to optimize for the front-end fetch/decode patterns in Pentium, Pentium Pro, Nehalem, etc. Fortunately modern x86 CPUs have decoded-uop caches and are much less sensitive to fetch/decode issues in loops.)

    (Semi-related: x86 registers: MBR/MDR and instruction registers modern)

    For a simple in-order non-pipelined CPU with a single physical PC register, it would mean the instruction-fetch logic needs to calculate a next-PC, or else the next instruction can't even be fetched while executing the current.

    In x86, IP / EIP / RIP logically holds the address of the next instruction while the current one is being executed. This makes sense given its origins in 8086, which only had ~29k transistors. It prefetched from the instruction stream while the current insn was being executed (into a small 6-byte buffer, which isn't even long enough to hold a whole instruction if extra prefixes are used, but which holds 6 single-byte instructions). But it didn't even start decoding the next until the current one was finished. (i.e not pipelined at all, or arguably 2-stage if you count prefetch which is very easy to decouple. This remained the case until 486, I think.)

    With a variable-length ISA, instruction-length isn't discovered until decode. Having PC = end of current instruction maybe matters more, because you can't just calculate PC+4 the way MIPS can, or PC+2 with your toy ISA. But you also can't go backwards unless you know the instruction length, so to properly handle exceptions 8086 must have tracked the instruction-start as well, or remembered the instruction-length.

提交回复
热议问题