Debugging a clobbered static variable in C (gdb broken?)

雨燕双飞 提交于 2019-12-12 13:15:39

问题


I've done a lot of programming but not much in C, and I need advice on debugging. I have a static variable (file scope) that is being clobbered after about 10-100 seconds of execution of a multithreaded program (using pthreads on OS X 10.4). My code looks something like this:

static float some_values[SIZE];
static int * addr;

addr points to valid memory address for a while, and then gets clobbered with some value (sometimes 0, sometimes nonzero), thereby causing a segfault when dereferenced. Poking around with gdb I have verified that addr is being layed out in memory immediately after some_values as one would expect, so my first guess would be that I have used an out-of-bounds index to write to some_values. However, this is a tiny file, so it is easy to check this is not the problem.

The obvious debugging technique would be to set a watchpoint on the variable addr. But doing so seems to create erratic and inexplicable behavior in gdb. The watchpoint gets triggered at the first assignment to addr; then after I continue execution, I immediately get a nonsensical segfault in another thread...supposedly a segfault on accessing the address of a static variable in a different part of the program! But then gdb lets me read from and write to that memory address interactively.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x001d5bd0
0x0000678d in receive (arg=0x0) at mainloop.c:39
39          sample_buf_cleared ++;
(gdb) p &sample_buf_cleared
$17 = (int *) 0x1d5bd0
(gdb) p sample_buf_cleared
$18 = 1
(gdb) set sample_buf_cleared = 2
(gdb) 

gdb is obviously confused. Does anyone know why? Or does anyone have any suggestions for debugging this bug without using watchpoints?


回答1:


  1. You could put an array of uint's between some_values and addr and determine if you are overruning some_values or if the corruption affects more addresses then you first thought. I would initialize padding to DEADBEEF or some other obvious pattern that is easy to distinguish and unlikely to occur in the program. If a value in the padding changes then cast it to float and see if the number makes sense as a float.

static float some_values[SIZE]; static unsigned int padding[1024]; static int * addr;

  1. Run the program multiple times. In each run disable a different thread and see when the problems goes away.

  2. Set the programs process affinity to a single core and then try the watchpoint. You may have better luck if you don't have two threads simultaneously modifying the value. NOTE: This solution does not preclude that from happening. It may make it easier to catch in a debugger.




回答2:


static variables and multi-threading generally do not mix.

Without seeing your code (you should include your threaded code), my guess is that you have two threads concurrently writing to addr variable. It doesn't work.

You either need to:

  • create separate instances of addr for each thread; or
  • provide some sort of synchronisation around addr to stop two threads changing the value at the same time.



回答3:


Try using valgrind; I haven't tried valgrind on OS X, and I don't understand your problem, but "try valgrind" is the first thing I think of when you say "clobbered".




回答4:


One thing you could try would be to create a separate thread whose only purpose is to watch the value of addr, and to break when it changes. For example:

static int * volatile addr;  // volatile here is important, and must be after the *
void *addr_thread_proc(void *arg)
{
    while(1)
    {
        int *old_value = addr;
        while(addr == old_value) /* spin */;
        __asm__("int3");  // break the debugger, or raise SIGTRAP if no debugger
    }
}
...
pthread_t spin_thread;
pthread_create(&spin_thread, NULL, &addr_thread_proc, NULL);

Then, whenever the value of addr changes, the int3 instruction will run, which will break the debugger, stopping all threads.




回答5:


gdb often acts weird with multithreaded programs. Another solution (if you can afford it) would be to put printf()s all over the place to try and catch the moment where your value gets clobbered. Not very elegant, but sometimes effective.




回答6:


I have not done any debugging on OSX, but I have seen the same behavior in GDB on Linux: program crashes, yet GDB can read and write the memory which program just tried to read/write unsuccessfully.

This doesn't necessarily mean GDB is confused; rather the kernel allowed GDB to read/write memory via ptrace() which the inferior process is not allowed to read or write. IOW, it was a (recently fixed) kernel bug.

Still, it sounds like GDB watchpoints aren't working for you for whatever reason.

One technique you could use is to mmap space for some_values rather than statically allocating space for them, arrange for the array to end on a page boundary, and arrange for the next page to be non-accessible (via mprotect).

If any code tries to access past the end of some_values, it will get an exception (effectively you are setting a non-writable "watch point" just past some_values).



来源:https://stackoverflow.com/questions/1005059/debugging-a-clobbered-static-variable-in-c-gdb-broken

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!