问题
Address of string literals are determined at compile time. This address and the string literal can be found in the built executable program (In ELF format). For example, the following code outputs String Literal: 0x400674
printf("String Literal: %p\n", "Hello World");
And objdump -s -j .rodata test1
shows
Contents of section .rodata:
400670 01000200 48656c6c 6f20576f 726c6400 ....Hello World.
....
So it looks like I can get the virtual address of "Hello World" by reading the executable program itself.
Question: How can I build a table/map/dictionary between the address of string literal and the the string itself, by reading the ELF format?
I am trying to writeup a standalone python script or c++ program to read the elf program and generate the table. It's OK if extra mapping(not the string literal) in the table, as long as the table contains the whole mapping of string literals.
回答1:
I am not sure your question always make sense. Details are implementation specific (operating system and compiler and compilation flags specific).
First, a compiler which sees both "abcd"
and "cd"
literal strings in the same translation unit is permitted (but not required) to share their storage and use "abcd"+2
as the second one. See this answer.
Then, in ELF files, strings are simply initialized read-only data (often in the .rodata
or .text
section of the text segment), and they could happen to be the same as some non-string constants. ELF files do not keep any typing information (except as debug DWARF information when compiled with -g
). In other words, the following
const uint8_t constable[] = { 0x65, 0x68, 0x6c, 0x6c, 0x6f, 0 };
has exactly the same machine representation as "hello"
literal string, but is not a source string. Even worse, some parts of the machine code could happen to look like strings.
BTW, you could use the strings(1) command, or perhaps study its source code and adapt it for your needs.
See also dladdr(3) and this question.
Bear in mind that two different processes have (by definition!) different address spaces in virtual memory. Read also about ASLR. Also string literals may occur in shared objects (e.g. shared libraries like libc.so
) which are often mmap
-ed in different address segments (so the same literal string would have different addresses in different processes!).
You might be interested by libelf or readelf(1) or bfd to read the ELF file.
来源:https://stackoverflow.com/questions/28621984/map-the-address-of-string-literal-to-string-literal-by-parsing-elf-c-program