Debug core file with no symbols

大憨熊 提交于 2019-12-02 16:22:32

This type of response from gdb:

(gdb) bt
#0  0xc0199470 in ?? ()

can also happen in the case that the stack was smashed by a buffer overrun, where the return address was overwritten in memory, so the program counter gets set to a seemingly random area.

This is one of the ways that even a build with a corresponding symbol database can cause a symbol lookup error (or strange looking backtraces). If you still get this after you have the symbol table, your problem is likely that your customer's data is causing some issues with your code.

For the future:

  1. Make sure that you always build with an external symbols database (this is not a debug build -- it's a release build, but you store the symbol table separately)
  2. keep it around for versions you deploy

For this situation:

You know the general area, so to see if you are right, go to the stack trace and find the assembly code -- eyeball it and see if you think it matches your source (this is easier if you have some idea what source generated this assembly). If it looks right, then you have some verification on your hypothesis. You might be able to figure out the values of the local variables by looking at the stack (since you know what you passed in and declared).

Under gdb, "info registers" should give you enough of the execution state at the time of the crash to use with a disassembly of the executable and and relevant shared libraries. I usually use objdump to disassemble, redirect output to a file, then bring up the file in my favorite editor - this is useful for keeping notes as things are figured out. Also gdb's "info target" and "info sharedlib" can be useful for figuring out where shared libraries are loaded.

With register state, stack contents, and disassembly in hand along with a little luck, it should be straightforward (if tedious) to reconstruct the callstack (unless, of course, the stack has been trashed by a buffer overrun or similar catastrophe... might need an Ouija board or crystal ball in that case.)

You might also be able to correlate a a disassembly of the newer version built with -g against the disassembly of the stripped version.

  1. Always use source control (CVS/GIT/Subversion/etc), even for test releases
  2. Tag all releases
  3. Consider (in the future) making a build with debugging (-g) and strip the executable before shipping. NOTE: Don't make two builds with and without -g; they may well not match up, since -g can on occasion cause different code to be generated even at the same optimization level. In super-performance-critical code you can forgo the -g for critical files - most it won't make a difference to.
  4. If you're really stuck, dump the stack and dump relevant parts of the heap to hex and look at it by hand; perhaps taking an instrumented copy and looking for similar "signatures" in the generated code and on the stack. This is real "old-school" debugging... :-)

Do you have the exact source that you used to compile the old version (eg; through a tag in the source tree or something like that)? Maybe you could rebuild using that, and possibly get an insight into where the crash occured?

Try running a "pmap" against the core file (if hp/ux has this tool). This should report the starting addresses of all modules in the core file. With this info, you should be able to take the address of the failure location and figure out what library crashed. Further address comparison between the crash address and the addresses of the known functions in the library ("nm" against the library should get that) may help you determine what function crashed.

Even if you do manage to identify the function at the top of the stack, it isn't very likely that this function is the source of the problem... hopefully it has actually crashed in your code and not, say, the standard C string library. Rebuilding the stack trace is the next-best thing at that point.

There is not much information here. The binary is stripped.But looking at segmentation fault...you should look for places where there is a possibility that you are overwriting a piece of memory.

This is just a suggestion. There can be many problems.

BTW, if you are not able to reproduce in your local machine then the volume of data on customers' might be a problem.

I don't think the core file is supposed to contain symbols. You need to able to build a version of your program that is exactly the same as what you shipped to your customer, but with -g. If you strip your debug executable, it should be identical to the shipped version. Only then can gdb give you anything useful.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!