When is it better for an assembler to use sign extended relocation like R_X86_64_32S instead of zero extension like R_X86_64_32?

问题

As a concrete example, on GAS 2.24, moving the address:

mov $s, %eax
s:

After:

as --64 -o a.o a.S
objdump -Sr a.o

Uses zero extension:

0000000000000000 <s-0x5>:
   0:   b8 00 00 00 00          mov    $0x0,%eax
                        1: R_X86_64_32  .text+0x5

But memory access:

mov s, %eax
s:

Compiles to sign extension:

0000000000000000 <s-0x7>:
   0:   8b 04 25 00 00 00 00    mov    0x0,%eax
                        3: R_X86_64_32S .text+0x7

Is there a rationale to using either in this specific case, or in general? I don't understand how the assembler could to any better supposition about either case.

NASM 2.10.09 just uses R_X86_64_32 for both of the above. Update: an edge nasm commit 6377180 after 2.11 produces the same output of Gas, which seemed like a bug as Ross mentioned.

I have explained what I think I understand about R_X86_64_32S at: https://stackoverflow.com/a/33289761/895245

回答1:

The difference is in the allowed addresses for the symbol s. In the first case with R_X86_64_32, the symbol must be in the range of 0x00000000'00000000 to 0x00000000'FFFFFFFF. In the second case with R_X86_64_32S, the address of the symbol must between 0xFFFFFFFF'80000000 and 0x00000000'7FFFFFFF. If s ends up with an address outside of these ranges then linker will give an error.

This corresponds to how the CPU interprets the 32-bit value of s encoded into the two instructions. In the first instruction, where it's an immediate operand, the 32-bit value is zero extended into RAX. In the second instruction the 32-bit value is a displacement in a memory operand, and so is sign extended to form a 64-bit address.

NASM shouldn't be using the unsigned R_X86_64_32 relocation for the second instruction. It's not question of which one is better, using R_X86_64_32 here is simply incorrect. NASM would permit the address of s to be 0x00000000'80000000, but CPU would end up accessing 0xFFFFFFFF'80000000 instead.

回答2:

With the immediate-data mov, the assembler is just doing what you wrote. Writing to a 32bit register always zero-extends the upper32 in x86-64. As documented in the Intel insn ref manual:

MOV r/m64, imm32 means: Move imm32 sign extended to 64-bits to r/m64.
MOV r/m32, imm32 means: Move imm32 to r/m32.

If you wanted sign-extension to match how 32bit addresses are treated in 32bit-absolute addressing modes, you should have written

mov $s, %rax

32bit displacements are always sign-extended. So I think Ross's answer is right, that NASM 2.10.09 is buggy. It's apparently telling the linker that the address will be zero-extended, when in fact it will be sign-extended. Of course, RIP-relative addressing takes fewer instruction bytes, so it should be preferred over absolute addressing when possible.

来源：https://stackoverflow.com/questions/33318342/when-is-it-better-for-an-assembler-to-use-sign-extended-relocation-like-r-x86-64

标签

assembly

nasm

x86-64

elf

gas