Why does the linker modify a --defsym “absolute address”

时光总嘲笑我的痴心妄想 提交于 2019-12-01 16:09:00

You appear to have fundamentally misunderstood what --defsym does.

--defsym=symbol=expression
   Create a global symbol in the *output* file, ...

That is, you are creating the new symbol in the library that you are building. As such, the symbol is (naturally) relocated with the library.

I am guessing you want something like this instead:

// code in library
int fn()
{
    // exe_fn not exported from the executable, but we know where it is.
    int (*exe_fn)(void) = (int (*)(void)) 0x432238;
    return (*exe_fn)();
}

If you didn't want to hard-code 0x432238 into the library, and instead pass the value on command line at build time, just use a -DEXE_FN=0x432238 to achieve that.

Update:

Goal: a shared library to use a function from an executable

That goal can not be achieved by the method you selected. You'll have to use other means.

Why the "absolute address" is modified?

It isn't. When you ask the linker to define function at absolute address 0x432238, it does exactly that. You can see it in objdump, nm and readelf -s output.

But because the symbol is defined in the shared library, all references to that symbol are relocated, i.e. adjusted by the shared library load address (that is done by the dynamic loader). It makes no sense whatsoever for the dynamic loader to do otherwise.

How to avoid it?

You can't. Use other means to achieve your goal.

Adding a counterpoint: yes there is an actual use to this but I think it's indeed broken, not only with dynamic libraries but also with position-independent executables.

ld itself will use symbols when used to embed binary files into executables:

ld -r -b binary hello_world.txt -o hello_world.o

this will produce an object file with, among others, the following symbols:

000000000000000c g       .data  0000000000000000 _binary_hello_world_txt_end
000000000000000c g       *ABS*  0000000000000000 _binary_hello_world_txt_size
0000000000000000 g       .data  0000000000000000 _binary_hello_world_txt_start

so that an executable that is include them can just use extern variables to access them. (... as in: our "hello world" text from hello_world.txt is the only thing in the .data section, with length 0xc).

Linking this object file into an executable file (and not stripping symbols) results in

0000000000411040 g     .data  0000000000000000              _binary_hello_world_txt_start
000000000041104c g     .data  0000000000000000              _binary_hello_world_txt_end
000000000000000c g     *ABS*  0000000000000000              _binary_hello_world_txt_size

and we can do things like

extern char _binary_hello_world_txt_start;
extern char _binary_hello_world_txt_size; // "char" is just made up in this one

// (...)
printf("text: %s\n", &_binary_hello_world_txt_start);
printf("number of bytes in it: %d\n", (int) (&_binary_hello_world_txt_size));

(yes it's looks fairly weird that we're looking for an address of something (which symbols are usually used for), and then we're treating it as an integer... but it actually works.)

Note also how the linker does know what it should relocate and what it shouldn't; the data pointers are relative to .data, while the size is *ABS*, which, as Gil describes, is not supposed to be relocated (... since it isn't calculated relatively to anything).

However, this only works in non-position-independent executables. Once you go from -fPIE (which is gcc's default lately in modern Linux distros, as it looks like) to -no-pie, the dynamic linker relocates everything, including *ABS* symbols. This is happening at runtime link time: the symbol tables look the same, regardless of how the executable was compiled.

The fact that the same thing happens for shared libraries seems to be a consequence of the same thing: the relocation of dynamically placed binaries (either a position-independent executable or a shared library) results in similar relocations, which do make sense for functions included in the binary itself, but not for *ABS* data.

Sadly, I don't have an answer to either of the questions: I also think it's done incorrectly, and I do not know how to fix it (see Getting the value of *ABS* symbols from C for another issue bumping into the same problem).

However, given how even GNU ld itself chooses to embed a size as a symbol this way... I do think this application / question is entirely valid, so as for answer:

  • ... it's done because the implementation isn't actually correct
  • as a workaround, "generating a header file with absolute addresses inline" comes to mind, following Employed Russian's answer

... but I'd actually be interested in how exactly to patch the relocation table the way Gil mentioned in the question!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!