What is the purpose of the assembler and symbol table? What is at a symbol's address?

问题

From my textbook:

To produce the binary version of each instruction in the assembly language program, the assembler must determine the addresses corresponding to all labels. Assemblers keep track of labels used in branches and data transfer instructions in a symbol table. As you might expect, the table contains pairs of symbols and addresses.

Why does it need a symbol table? If we have a symbol table with a label name and an address, what is the use of the address? What is at the address... just the name of the label? Or is it the instructions of the label?

Say we have an instruction like this in assembly MIPS:

add_numbers:
   addi, $s0, $t0, 2

Why wouldn't the symbol table just store add_numbers | <the_binary_representation_of_the_instruction> instead of add_numbers | <address_location_of_label>?

回答1:

A label IS an address, it is a way for programmers to provide an address to the assembler but not have to know the physical address. Let the toolchain do that work for you.

I dont remember my MIPS off hand so here is some pseudo code.

loop_top:
   nop
   nop
   sub r0,1
   cmp r0,0
   bne loop_top

Depending on the instruction set, but in general the conditional branch will be pc-relative. Tables in general used during assembly with one or more passes on the table will resolve the distance between the branch and the destination so that the branch can be encoded completely. Most instruction sets the above can be resolved in one pass. loop_top is a label that will have an address, but for the branch here it is pc-relative and you dont need to know the physical address.

But

   call my_fun

once making a pass on the code, the assembler finds that my_fun is not defined in this file and/or the assembly language has some syntax to mark it as external before used. Either way it is external. Cannot be resolved at the time this file is assembled. So tables are required indicating the label name, and where in this object that instruction lives, depending on the assembler it may fill in the temporary offset or full address as zero for now or encode it as an infinite loop. The linker later determines the actual address for things in the processors memory space, the linker will ultimately have a table of all (relevant labels at this phase of the toolchain) labels and their addresses while linking, then the linker will go back into the code and repair/create the machine code for this call instruction now that it knows what the actual address is for that label.

j hello

the object:

Disassembly of section .text:

00000000 <.text>:
   0:   08000000    j   0x0
   4:   00000000    nop

another object:

.globl hello
hello:
    j hello

.word hello

link them

Disassembly of section .text:

00001000 <_ftext>:
    1000:   08000402    j   1008 <hello>
    1004:   00000000    nop

00001008 <hello>:
    1008:   08000402    j   1008 <hello>
    100c:   00000000    nop
    1010:   00001008    0x1008

As objects all the toolchain has to go on is the label hello being used as an address to be resolved later. In this case at link time, the linker works through the objects, counting bytes making a table of labels and their addresses. During the first or some other pass it will change the instructions or data as needed to resolve these labels.

Now old school assemblers that did the job of assembling and linking from the same source file, the statement "assembler must determine the addresses corresponding to all labels". It is not the assembler in general with commonly used toolchains that does the linker work. So that quoted statement could use some improvement. But hopefully this demonstrates that labels are addresses, they represent a yet to be determined address so the code is easier to write than something like this

  nop
  nop
  j pc-2

then if you add another instruction

  nop
  add r0,r1
  nop
  j pc-3

   j 0x1008

then have to spend a significant amount of time re-writing the program to get each and every address hardcoded into the program. Add/remove a single line and a lot of other code has to be changed. Labels representing addresses make that all significantly easier and the toolchain determines addresses, then goes back and replaces the labels with addresses basically...

Added a nop:

Disassembly of section .text:

00001000 <_ftext>:
    1000:   08000403    j   100c <hello>
    1004:   00000000    nop
    1008:   00000000    nop

0000100c <hello>:
    100c:   08000403    j   100c <hello>
    1010:   00000000    nop
    1014:   0000100c

If we didnt have labels and had to hardcode the address instead then you would have to change those three places as a result of the nop. One line. If you added dozens of lines, hundreds. How would you keep track of it all? By putting labels in comments? assemble and disassemble and patch up the source over and over again until it looked somewhat right and hope for no bugs.

mips-elf-readelf -s so.elf

Symbol table '.symtab' contains 14 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00001000     0 SECTION LOCAL  DEFAULT    1 
     2: 00400000     0 SECTION LOCAL  DEFAULT    2 
     3: 00400018     0 SECTION LOCAL  DEFAULT    3 
     4: 00000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000a010     0 NOTYPE  LOCAL  DEFAULT    2 _gp
     6: 00002018     0 NOTYPE  GLOBAL DEFAULT    4 _fdata
     7: 0000100c     0 OBJECT  GLOBAL DEFAULT    1 hello
     8: 00001000     0 NOTYPE  GLOBAL DEFAULT    1 _ftext
     9: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _start
    10: 00002018     0 NOTYPE  GLOBAL DEFAULT    2 __bss_start
    11: 00002018     0 NOTYPE  GLOBAL DEFAULT    2 _edata
    12: 00002018     0 NOTYPE  GLOBAL DEFAULT    2 _end
    13: 00002018     0 NOTYPE  GLOBAL DEFAULT    2 _fbss

and here is the one of interest:

     7: 0000100c     0 OBJECT  GLOBAL DEFAULT    1 hello

the label hello once assembled and linked into a final binary is equal to address 0x100C

来源：https://stackoverflow.com/questions/57984652/what-is-the-purpose-of-the-assembler-and-symbol-table-what-is-at-a-symbols-add

标签

assembly

mips