Understanding ARM relocation (example: str x0, [tmp, #:lo12:zbi_paddr])

问题

I found this line of assembly in zircon kernel start.S

str     x0, [tmp, #:lo12:zbi_paddr]

for ARM64. I also found that zbi_paddr is defined in C++:

extern paddr_t zbi_paddr;

So I started looking about what does #:lo12: mean.

I found https://stackoverflow.com/a/38608738/6655884 which looks like a great explanation, but it does not explain the very basic: what is a rellocation and why some things are needed.

I guess that since zbi_paddrr is defined in start.S and used in C++ code, since start.S generates on object file start.o with addresses starting at 0, the linking process will have to reallocate all addresses there to addresses in the final executable file.

In order to keep track of the symbols that need rellocation, ELF stores these structs, as said in the answer:

typedef struct
{
    Elf64_Addr r_offset;    /* Address of reference */
    Elf64_Xword r_info;     /* Symbol index and type of relocation */
} Elf64_Rel;

typedef struct
{
    Elf64_Addr r_offset;    /* Address of reference */
    Elf64_Xword r_info;     /* Symbol index and type of relocation */
    Elf64_Sxword r_addend;  /* Constant part of expression */
} Elf64_Rela;

So for example, r_offset would store the address of zbi_paddr in the final executable. Then, when the program is loaded, the loader looks on these structs and then fills the address of zbi_paddr from the C++ code.

After that I completely missed the need for those things like S, A, P, X and abs_g0_s and lo12. He says it's related to instructions not being able to insert 64 bits into registers. Can someone give me more context? I can't understand, there are already ways to insert 64 bits into registers. And how this is related to reallocation?

回答1:

The underlying issue is that ARM64 instructions are all 32 bits in size, which limits the number of bits of immediate data that can be encoded in any one instruction. You certainly cannot encode 64 bits of address, or even 32 bits.

The code and static data of the kernel can be expected to be well under 4 GB, so in order to store data in the static variable zbi_paddr, the programmer can write the following two instructions (including the preceding one which you omitted but is crucial). Note that tmp is a macro defined above as x9, so the code expands to:

adrp    x9, zbi_paddr
str     x0, [x9, #:lo12:zbi_paddr]

Now when linking occurs, the linker will know the layout of the entire kernel, and the relative locations of all symbols. This scheme supports position-independent code, so the absolute addresses need not be known, but we will certainly know the displacement between zbi_paddr and the adrp instruction above, which will fit in a signed 32-bit value, as well as the offset of zbi_paddr within its 4KB page (since the kernel will necessarily be loaded at a page-aligned address).

So bits 12 and higher of this displacement will be encoded into the adrp instruction, which has a 21-bit immediate field. adrp will sign-extend it, add it to the corresponding bits of the program counter, and place the result in x9. Then x9 will contain bits 63-12 of the absolute address of zbi_paddr, with the low 12 bits being zeroed.

The 12-bit offset of zbi_paddr within its page will be encoded into the 12-bit immediate field of the str instruction. It adds this immediate to the value in x9, which will then yield the address of zbi_paddr, and it stores x0 at that address. So we have managed to store a value in zbi_paddr with just two instructions.

To support this, the object file produced by assembling our code needs to instruct the linker that bits 32-12 of the displacement need to be inserted into the adrp instruction, and bits 11-0 of the address of zbi_paddr need to be inserted into the str instruction. These instructions to the linker are what relocations are; they'll contain a reference to the symbol whose address is to be encoded (here zbi_paddr) and what specifically is to be done with it. ELF supports relocations specifically designed for these instructions, that put just the right bits in the right place in the instruction word.

It's true that there are other ways to get a 64-bit value into a register. For instance, it can be placed in the literal pool, which is an area of data close enough to the corresponding code that it can be reached with a single ldr instruction (with PC-relative displacement). You could have a relocation telling the linker to insert the absolute address of zbi_paddr in the literal pool. But loading it requires an additional memory access, which is slower than adrp; moreover, the 8 bytes of literal, plus the ldr, plus the str to actually do the store, add up to a total of 16 bytes of memory needed. The adrp/str approach only needs 8, and it works better with position-independent code, where the linker may not actually know the absolute address of zbi_paddr.

If you don't like the load from memory, you can get the absolute address of zbi_paddr into a register with up to four mov/movk instructions, loading 16 bits at a time. There are relocations for that, too. But with the final str, we are using up to 20 bytes of code; executing five instructions takes more clock cycles than two; and there's still a problem with position-independent code.

As such, adrp/str, with :lo12: as noted, is the standard accepted method for accessing a global or static variable. If you want to load instead of store, you use adrp/ldr. And if you want the address of zbi_paddr in a register, you do

adrp x9, zbi_paddr
add x9, x9, #:lo12:zbi_paddr

The add instruction also supports a 12-bit immediate, precisely for this purpose.

These features are explained in the GNU assembler manual.

来源：https://stackoverflow.com/questions/64838776/understanding-arm-relocation-example-str-x0-tmp-lo12zbi-paddr

标签

assembly

arm

arm64

linker-scripts