MIPS labels storage location

问题

In MIPS, while using a jump instruction, we use a label.

again: nop
    $j again

So when we reach the jump instruction, we use the label again to show where to go and the value of the actual address there is used. I wanted to know where the label again is stored. Meaning, say nop is stored at 0x00400000, and the jump instruction is at 0x00400004. Where, then is again kept, how does MIPS know again is pointing to 0x00400000? Is it stored in the Dynamic Data area of the memory map? This is the memory map I've been provided for MIPS

I've also included the question which caused this confusion below, for reference.

Give the object code in hexadecimal for the following branch (be, bne) and jump (j) instructions.

... # some other instructions
again:  add ... # there is an instruction here and meaning is insignificant
    add ... # likewise for the other similar cases
    beq    $t0, $t1, next
    bne  $t0, $t1, again
    add ...
    add ...
    add ...
next:   j   again

Assume that the label again is located at memory location 0x10 01 00 20. If you think that you do not have enough information to generate the code explain.

回答1:

Each label corresponds to a unique address in memory. So in your example, and in agreement with what you stated, if the nop instruction exists at 0x00400000 then again will correspond (not point--more on that in a second) to that same address.

Labels can exist in both the text and data segments. However, in your example the label is shown in the .text: segment. So, it represent the address of an instruction as opposed to a variable.

Here's the important distinction:

Labels are a part of most ISAs to make writing assembly easier for humans. However, it's important to remember that assembly is not the final form of code. In other words, in the binary representation your label won't be much of a label anymore.

So, this is what will happen:

The assembler will recognize the memory address associated with each label's instruction. Let's keep our running example of 0x00400000. Then, in each jump instruction it will take this address and use it to replace the label in the opcode. Poof, no more labels and definitely no pointers (which would imply we would have another place in memory that is storing a memory address).

Of course, the memory address itself corresponds to a spot in the text segment in your example because it matches to an instruction.

Simply stated, labels exist to make our lives easier. However, once they're assembled they're converted to the actual memory address of the instruction/variable that they've labeled.

回答2:

The label itself is not stored anywhere. It's just symbolic address for assembler/linker. The jump j again instruction opcode does store the actual resulting address, like a number.

The linker will glue together all object files, merging all symbols across object files and filling up correct relative addresses + creating relocation table for OS loader, producing executable file.

The OS upon loading the executable will also load the relocation table, modify/fill-up the instructions working with absolute addresses according to the actual address, where the binary was loaded, then throws the relocation table away, and executes the code.

So the labels are just "source thing" for programmer, alias for particular fixed memory address, to save programmer from counting actual instruction opcode sizes and calculating jump offsets in head, or memory variables addresses.

You may want to check the "list file" from your assembler (often /l switch), while compiling some assembly source, to see actual machine code bytes produced (none for labels).

Your "task" code when compiled at 0x00400000 looks like this (I set those add to do t1=t1+t1 to have anything there):

 Address    Code        Basic                     Source

0x00400000  0x01294820  add $9,$9,$9          4     add  $t1,$t1,$t1
0x00400004  0x01294820  add $9,$9,$9          5     add  $t1,$t1,$t1
0x00400008  0x11090004  beq $8,$9,0x00000004  6     beq  $t0, $t1, next
0x0040000c  0x1509fffc  bne $8,$9,0xfffffffc  7         bne  $t0, $t1, again
0x00400010  0x01294820  add $9,$9,$9          8     add  $t1,$t1,$t1
0x00400014  0x01294820  add $9,$9,$9          9     add  $t1,$t1,$t1
0x00400018  0x01294820  add $9,$9,$9          10    add  $t1,$t1,$t1
0x0040001c  0x08100000  j 0x00400000          11   next:   j   again

As you can see, each real instruction does produce 32bit value, which is called sometimes "opcode" (operation code), that value is visible in column "Code". The column "Address" is saying, where this value is stored in memory, when the executable is loaded, and prepared to be executed. The column "Basic" shows the instructions disassembled back from the opcodes, and at last position there is column "Source".

Now see how the conditional jumps encodes the relative jump value into 16 bits (beq $8, $9 opcode is 0x1109, and the other 16 bits 0x0004 are 16 bit sign extended value "how much to jump"). That value is meant as number of instructions away from "current position", where current is address of following instruction, ie.

0x0040000c + 0x0004 * 4 = 0x0040001c = target address

*4, because on MIPS every instruction is exactly 4 bytes long, and memory addressing works per byte, not per instruction.

The same goes for next bne, opcode itself is 0x1509, offset is 0xfffc, that's -4. =>

0x00400010 + (-4) * 4 = 0x00400000

The absolute jump uses different encoding, it's 6 bits opcode 0b000010xx (xx are two bits of address stored in the first byte together with j opcode, in this example they are zero) followed with 26b address divided by four 0x0100000, because every instruction must start at aligned address, so it would be waste to encode the two least significant bits, they would be always 00. 0x100000 * 4 = 0x00400000 ... I'm too lazy to check how it work on MIPS, but I think the j defines bits 2-27, 0-1 are zeroes, and 28-31 are copied from current pc maybe? Making the CPU capable to work over full 4GiB address range, but there's probably some special way how to jump between different "banks" (upper 4 bits of pc)) .. I'm not sure, I never did code for MIPS, so I didn't read the CPU specs.

Anyway, if you say the again: is at 0x10010020, all of these can be recalculated to follow that a produce functional code ready to be executed at 0x10010020 (although that j will be tricky, you would have to know for sure, how the total address is composed, if upper 4 bits are copied or what).

BTW, the real MIPS CPU does delayed branching (ie. the next instruction after branch jump is executed always, meanwhile the condition is evaluated, and the jump happens after the next instruction), and I think the pc used to calculate target address is also 1 instruction "later" one, so the correct code for real MIPS would have that beq ahead of the second add, but the relative offset would be still 0x0004. :) Simple eh? If it doesn't make sense to you, check MARS settings (the emulation of delayed branching is switched OFF by default, to not confuse students), and search google for some better explanation. Nice little funny CPU it is, that MIPS. :)

回答3:

The conversion of label to its corresponding address is done by the code assembler or MIPS simulator you are using, for example, MARS is a MIPS simulator, so MARS is doing that conversion. MARS will find the address of the label for you.

来源：https://stackoverflow.com/questions/42611424/mips-labels-storage-location

标签

assembly

mips

machine-language