QNX pthread_mutex_lock causing deadlock error ( 45 = EDEADLK )

爷,独闯天下 提交于 2019-12-13 20:43:26


I am implementing an asynchronous log writing mechanism for my project's multithreaded application. Below is the partial code of the part where the error occurs.

void CTraceFileWriterThread::run() 
    bool fShoudIRun = shouldThreadsRun();  //  Some global function which decided if operations need to stop. Not really relevant here. Assume "true" value.

      std::string nextMessage = fetchNext();
      if( !nextMessage.empty() )
          fShoudIRun = shouldThreadsRun();

//This is the consumer. This is in my thread with lower priority
std::string CTraceFileWriterThread::fetchNext() 
    // When there are a lot of logs, I mean A LOT, I believe the 
    // control stays in this function for a long time and an other 
    // thread calling the "add" function is not able to acquire the lock
    // since its held here.

    std::string message;

    if( !writeQueue.empty() )
      writeQueueMutex.lock();        // Obj of our wrapper around pthread_mutex_lock 
      message = writeQueue.front();  
      writeQueue.pop();              // std::queue
      writeQueueMutex.unLock() ;
    return message;

//  This is the producer and is called from multiple threads.
void CTraceFileWriterThread::add( std::string outputString ) {

if ( !outputString.empty() )
    // crashes here while trying to acquire the lock when there are lots of
    // logs in prod systems.

    const size_t writeQueueSize = writeQueue.size();

    if ( writeQueueSize == maximumWriteQueueCapacity )
        outputString.append ("\n queue full, discarding traces, traces are incomplete" );

    if ( writeQueueSize <= maximumWriteQueueCapacity )
        bool wasEmpty = writeQueue.empty();

        condVarTraceWriter.post(); // will be waiting in a function which calls "fetchNext"



int wrapperMutex::lock() {
//#[ operation lock()

 int iRetval;
 int iRetry = 10;

    tRfcErrno = pthread_mutex_lock (&tMutex);
    if ( (tRfcErrno == EINTR) || (tRfcErrno == EAGAIN) )
        iRetval = RFC_ERROR;    
    else if (tRfcErrno != EOK)
        iRetval = RFC_ERROR;    
        iRetry = 0;
        iRetval = RFC_OK;    
        iRetry = 0;
 } while (iRetry > 0);

 return iRetval;



I generated the core dump and analysed it with GDB and here are some findings

  1. Program terminated with signal 11, Segmentation fault.

  2. "Errno=45" at the add function where I am trying to acquire the lock. The wrapper we have around pthread_mutex_lock tries to acquire the lock for around 10 times before it gives up.

The code works fine when there are fewer logs. Also, we do not have C++11 or further and hence restricted to mutex of QNX. Any help is appreciated as I am looking at this issue for over a month with little progress. Please ask if anymore info is required.

