问题
I have a program that deadlocks when one of the threads calls pthread_cond_siganl
(or broadcast).
The problem is reproducible 100% in the main program. I could not figure out what is wrong with it and thus extracted the piece of code that wait and signal are called. However, the deadlock cannot be reproduced with the extracted problem.
Running valgrind
on the main program does not report any invalid reads/writes or memory leaks.
I want to know what are the possible reasons for a deadlock when calling pthread_cond_signal
.
The extracted snippet follows.
#include <pthread.h>
#include <math.h>
#include <syscall.h>
#include <assert.h>
#include <stdlib.h>
#include <iostream>
using namespace std;
void Task() {
cerr << syscall(SYS_gettid) << " In Task, sleeping..." << endl;
sleep(5);
}
pthread_mutex_t lock;
pthread_cond_t cond;
bool doingTheTask= false;
void* func(void* ) {
pthread_mutex_lock(&lock);
if (doingTheTask) {
cerr << syscall(SYS_gettid) << " wait... " << endl;
while ( doingTheTask) {//spurious wake-up
cerr << syscall(SYS_gettid) << " waiting..." << endl ;
pthread_cond_wait(&cond, &lock);
cerr << syscall(SYS_gettid) << " woke up!!!" << endl ;
}
}
else {
cerr << syscall(SYS_gettid) << " My Turn to do the task..." << endl;
assert( ! doingTheTask );
doingTheTask= true;
pthread_mutex_unlock(&lock);
Task();
cerr << syscall(SYS_gettid) << " Before trying to acquire lock" << endl;
pthread_mutex_lock(&lock);
cerr << syscall(SYS_gettid) << " After acquiring lock" << endl ;
assert( doingTheTask );
doingTheTask = false;
cerr << syscall(SYS_gettid) << " Before broadcast" << endl;
pthread_cond_broadcast(&cond);
cerr << syscall(SYS_gettid) << " After broadcast" << endl;
}
pthread_mutex_unlock(&lock);
return NULL;
}
int main() {
pthread_mutex_init(&lock,NULL);
pthread_cond_init(&cond,NULL);
pthread_t thread[2];
for ( int i = 0 ; i < 2 ; i ++ ) {
if (0 != pthread_create(&thread[i], NULL, func, NULL) ) {
cerr << syscall(SYS_gettid) << " Error creating thread" << endl;
exit(1);
}
}
for ( int i = 0 ; i < 2 ; i ++ ) {
pthread_join(thread[i],NULL);
}
pthread_mutex_destroy(&lock);
pthread_cond_destroy(&cond);
return 0;
}
The only important part is the func function. The other parts are just presented in order to compile.
As I said the problem is not reproducible in this program. The difference between this snippet and the main program are:
- In the main program, the
mutex
andcondvar
are member fields and the function is a member method. - The task does some task instead of sleeping.
- Multiple threads may wait and we should broadcast rather than signal. However, deadlock is 100% reproducible even when I use signal and one waiting thread.
The problem that I am trying to solve with this piece of code is a mechanism to do the task once when at least one of the threads needs it to be done. But no two threads should do the task in parallel and once one of them does the task, the others do not need to do it. The clients of this method assume that it blocks until the task is done (thus I cannot return immediatly after seeing that someone is doing the task).
The backtrace of the deadlocked threads are:
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007ffff73e291c in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:259
and
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007ffff73e30b1 in pthread_cond_signal@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S:142
pthread_cond_signal deadlocks is a similar problem. But seems like the one asking question had memory corruption. I do not have memory corruption (says valgrind
).
The problem is 100% reproducible on the two machines I tested it on. (ArchLinux latest and Uubntu 10.04.3).
A sample output of the main program follows. It again shows that the threads block before calling pthread_cond_wait
and pthread_cond_signal
. (The first column shows the thread ids).
3967 In Task, sleeping...
3967 My Turn to do the task...
3967 In Task, sleeping...
3973 wait...
3973 waiting...
3976 <output from some other thread>
3967 Before trying to acquire lock
3967 After acquiring lock
3967 Before broadcast
The main program is in C++. But I am using the C parts of the language and thus avoided using C++ tag.
回答1:
Stupid error.
I was destroying the mutex
and condvar
before executing signal and wait.
To reproduce, just move the destroy functions before the joining the threads in the main function.
It is still surprising that on both of my machines, this produces 100% consistent (and wrong) behavior.
回答2:
When we call pthread_cond_wait(&cond, &lock), the lock will be released and pthread will wait on the condition variable. When it gets the signal on the conditional variable then it will acquire the lock and will come out of pthread_cond_wait(). In your program, you acquired a mutex lock before calling pthread_cond_broadcast(&cond) hence pthread_cond_wait(&cond, &lock) cannot take the lock when it receives the signal. I think that will be the reason for the deadlock.
来源:https://stackoverflow.com/questions/8248458/pthread-cond-signal-causing-deadlock