Why does the lw instruction's second argument take in both an offset and regSource?

问题

So the lw instruction is in the following format: lw RegDest, Offset(RegSource). Why does the second argument take in both an offset and register source? Why not only one (i.e. only register source)?

回答1:

Because what else are you going to do with the rest of the 32-bit instruction word? (Assuming you're the CPU architect designing the MIPS instruction set).

Leaving out the 16-bit immediate displacement can't make the instruction shorter, because MIPS is a RISC with fixed-length instruction words.

MIPS doesn't have a lot of different opcodes (especially classic MIPS without any FPU instructions, and without 64-bit instructions). It uses a lot of the instruction coding space to support large immediate operands for most instructions. (Unlike ARM32 for example which uses 4 bits in each instruction for predicated execution, and more bits for flexible source operand (optional rotate or shift by a constant or another register).

MIPS only has one addressing mode, and imm16(reg) can save a significant number of addui instructions vs. just (reg).

For example, consider a C function that loads or stores to a static (or global) variable. Like

unsigned rng(void) {
    static unsigned seed = 1234;
    return (seed = seed * 5678 + 0x1234);
}

The compiler-generated (or hand-written) asm needs to load and store from seed, so you need it in a register. But it's a 32-bit constant that doesn't fit in a single instruction. In hand-written asm you'd probably use a pseudo-instruction like la $t0, rng.seed, which will assemble to lui $t0, hi(rng.seed) / ori $t0, $t0, lo(rng.seed). (hi and lo get half of the 32-bit address).

But you can do better than that:

lui   $t0, hi(rng.seed)
lw    $t1, lo(rng.seed) ($t0)

i.e. use the low 16 bits of the address as the 16-bit displacement in the load instruction. This is in fact what compilers like gcc do:

rng:    # gcc5.4 -O3
    lui     $5,%hi(seed.1482)
    lw      $4,%lo(seed.1482)($5)
    nop                       ; classic MIPS has a 1-cycle "shadow" for loads before the result is usable, with no pipeline interlock
    sll     $3,$4,5          ; I should have picked a simpler multiply constant (with fewer bits set)
    sll     $2,$4,3
    subu    $2,$3,$2
    sll     $3,$2,3
    subu    $2,$3,$2
    subu    $2,$2,$4
    sll     $3,$2,4
    addu    $2,$2,$3
    sll     $2,$2,1
    addiu   $2,$2,4660
    j       $31
    sw      $2,%lo(seed.1482)($5)       ; branch-delay slot

seed.1482:
    .word   1234

There are lots of other uses for small immediate displacements from a register. For example:

accessing locals on the stack if the compiler spills anything
struct fields
Array access in an unrolled loop. (MIPS has 32 integer registers, and is pretty much designed for software-pipelining to unroll loops).
small compile-time constant array indices.

As I said, there isn't much else you could do with those extra 16 bits of the instruction word that would be a good fit for MIPS. You could leave fewer than 16 bits for the displacement, but MIPS isn't PowerPC (where there are lots and lots of opcodes).

来源：https://stackoverflow.com/questions/48140313/why-does-the-lw-instructions-second-argument-take-in-both-an-offset-and-regsour

标签

assembly

mips

cpu-architecture