pthread_cond_signal causing deadlock

妖精的绣舞 提交于 2019-12-10 20:48:25

问题


I have a program that deadlocks when one of the threads calls pthread_cond_siganl (or broadcast). The problem is reproducible 100% in the main program. I could not figure out what is wrong with it and thus extracted the piece of code that wait and signal are called. However, the deadlock cannot be reproduced with the extracted problem.

Running valgrind on the main program does not report any invalid reads/writes or memory leaks.

I want to know what are the possible reasons for a deadlock when calling pthread_cond_signal.

The extracted snippet follows.

#include <pthread.h>
#include <math.h>
#include <syscall.h>
#include <assert.h>
#include <stdlib.h>
#include <iostream>

using namespace std;

void Task() {
    cerr << syscall(SYS_gettid) << " In Task, sleeping..." << endl;
    sleep(5);
}

pthread_mutex_t lock;
pthread_cond_t cond;
bool doingTheTask= false;

void* func(void* ) { 
    pthread_mutex_lock(&lock);
    if (doingTheTask) {
        cerr << syscall(SYS_gettid) << " wait... " << endl;
        while ( doingTheTask) {//spurious wake-up
            cerr << syscall(SYS_gettid) << " waiting..." << endl ;
            pthread_cond_wait(&cond, &lock);
            cerr << syscall(SYS_gettid) << " woke up!!!" << endl ;
        }
    }
    else {
        cerr << syscall(SYS_gettid) << " My Turn to do the task..." << endl;
        assert( ! doingTheTask );
        doingTheTask= true;
        pthread_mutex_unlock(&lock);
        Task();
        cerr << syscall(SYS_gettid) << " Before trying to acquire lock" << endl;
        pthread_mutex_lock(&lock);
        cerr << syscall(SYS_gettid) << " After acquiring lock" << endl ;
        assert( doingTheTask );
        doingTheTask = false;
        cerr << syscall(SYS_gettid) << " Before broadcast" << endl;
        pthread_cond_broadcast(&cond);
        cerr << syscall(SYS_gettid) << " After broadcast" << endl;
    }
    pthread_mutex_unlock(&lock);
    return NULL;
}


int main() {
    pthread_mutex_init(&lock,NULL);
    pthread_cond_init(&cond,NULL);
    pthread_t thread[2];

    for ( int i = 0 ;  i < 2 ; i ++ ) {
        if (0 != pthread_create(&thread[i], NULL, func, NULL) ) {
            cerr << syscall(SYS_gettid) << " Error creating thread" << endl;
            exit(1);
        }
    } 

    for ( int i = 0 ;  i < 2 ; i ++ ) {
        pthread_join(thread[i],NULL);
    }
    pthread_mutex_destroy(&lock);
    pthread_cond_destroy(&cond);

    return 0;
}

The only important part is the func function. The other parts are just presented in order to compile.

As I said the problem is not reproducible in this program. The difference between this snippet and the main program are:

  • In the main program, the mutex and condvar are member fields and the function is a member method.
  • The task does some task instead of sleeping.
  • Multiple threads may wait and we should broadcast rather than signal. However, deadlock is 100% reproducible even when I use signal and one waiting thread.

The problem that I am trying to solve with this piece of code is a mechanism to do the task once when at least one of the threads needs it to be done. But no two threads should do the task in parallel and once one of them does the task, the others do not need to do it. The clients of this method assume that it blocks until the task is done (thus I cannot return immediatly after seeing that someone is doing the task).

The backtrace of the deadlocked threads are:

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007ffff73e291c in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:259

and

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007ffff73e30b1 in pthread_cond_signal@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S:142

pthread_cond_signal deadlocks is a similar problem. But seems like the one asking question had memory corruption. I do not have memory corruption (says valgrind).

The problem is 100% reproducible on the two machines I tested it on. (ArchLinux latest and Uubntu 10.04.3).

A sample output of the main program follows. It again shows that the threads block before calling pthread_cond_wait and pthread_cond_signal. (The first column shows the thread ids).

3967    In Task, sleeping...
3967    My Turn to do the task...
3967    In Task, sleeping...
3973    wait...
3973    waiting...
3976    <output from some other thread>
3967    Before trying to acquire lock
3967    After acquiring lock
3967    Before broadcast

The main program is in C++. But I am using the C parts of the language and thus avoided using C++ tag.


回答1:


Stupid error. I was destroying the mutex and condvar before executing signal and wait. To reproduce, just move the destroy functions before the joining the threads in the main function.

It is still surprising that on both of my machines, this produces 100% consistent (and wrong) behavior.




回答2:


When we call pthread_cond_wait(&cond, &lock), the lock will be released and pthread will wait on the condition variable. When it gets the signal on the conditional variable then it will acquire the lock and will come out of pthread_cond_wait(). In your program, you acquired a mutex lock before calling pthread_cond_broadcast(&cond) hence pthread_cond_wait(&cond, &lock) cannot take the lock when it receives the signal. I think that will be the reason for the deadlock.



来源:https://stackoverflow.com/questions/8248458/pthread-cond-signal-causing-deadlock

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!