How to implement thread-safe container with natural looking syntax?

问题

Preface

Below code results in undefined behaviour, if used as is:

vector<int> vi;
...
vi.push_back(1);  // thread-1
...
vi.pop(); // thread-2

Traditional approach is to fix it with std::mutex:

std::lock_guard<std::mutex> lock(some_mutex_specifically_for_vi);
vi.push_back(1);

However, as the code grows, such things start looking cumbersome, as everytime there will be a lock before a method. Moreover, for every object, we may have to maintain a mutex.

Objective

Without compromising in the syntax of accessing an object and declaring an explicit mutex, I would like to create a template such that, it does all the boilerplate work. e.g.

Concurrent<vector<int>> vi;  // specific `vi` mutex is auto declared in this wrapper
...
vi.push_back(1); // thread-1: locks `vi` only until `push_back()` is performed
...
vi.pop ()  // thread-2: locks `vi` only until `pop()` is performed

In current C++, it's impossible to achieve this. However, I have attempted a code where if just change vi. to vi->, then the things work as expected in above code comments.

Code

// The `Class` member is accessed via `->` instead of `.` operator
// For `const` object, it's assumed only for read purpose; hence no mutex lock
template<class Class,
         class Mutex = std::mutex>
class Concurrent : private Class
{
  public: using Class::Class;

  private: class Safe
           {
             public: Safe (Concurrent* const this_,
                           Mutex& rMutex) :
                     m_This(this_),
                     m_rMutex(rMutex)
                     { m_rMutex.lock(); }
             public: ~Safe () { m_rMutex.unlock(); }

             public: Class* operator-> () { return m_This; }
             public: const Class* operator-> () const { return m_This; }
             public: Class& operator* () { return *m_This; }
             public: const Class& operator* () const { return *m_This; }

             private: Concurrent* const m_This;
             private: Mutex& m_rMutex;
           };

  public: Safe ScopeLocked () { return Safe(this, m_Mutex); }
  public: const Class* Unsafe () const { return this; }

  public: Safe operator-> () { return ScopeLocked(); }
  public: const Class* operator-> () const { return this; }
  public: const Class& operator* () const { return *this; }

  private: Mutex m_Mutex;
};

Demo

Questions

Is using the temporary object to call a function with overloaded operator->() leads to undefined behavior in C++?
Does this small utility class serve the purpose of thread-safety for an encapsulated object in all the cases?

Clarifications

For inter-dependent statements, one needs a longer locking. Hence, there is a method introduced: ScopeLocked(). This is an equivalent of the std::lock_guard(). However the mutex for a given object is maintained internally, so it's still better syntactically.
e.g. instead of below flawed design (as suggested in an answer):

if(vi->size() > 0)
  i = vi->front(); // Bad: `vi` can change after `size()` & before `front()`

One should rely on below design:

auto viLocked = vi.ScopeLocked();
if(viLocked->size() > 0)
  i = viLocked->front();  // OK; `vi` is locked till the scope of `viLocked`

In other words, for the inter-dependent statements, one should be using the ScopeLocked().

回答1:

Don't do this.

It's almost impossible to make a thread safe collection class in which every method takes a lock.

Consider the following instance of your proposed Concurrent class.

Concurrent<vector<int>> vi;

A developer might come along and do this:

 int result = 0;
 if (vi.size() > 0)
 {
     result = vi.at(0);
 }

And another thread might make this change in between the first threads call to size() and at(0).

vi.clear();

So now, the synchronized order of operations is:

vi.size()  // returns 1
vi.clear() // sets the vector's size back to zero
vi.at(0)   // throws exception since size is zero

So even though you have a thread safe vector class, two competing threads can result in an exception being thrown in unexpected places.

That's just the simplest example. There are other ways in which multiple threads attempting to read/write/iterate at the same time could inadvertently break your guarantee of thread safety.

You mentioned that the whole thing is motivated by this pattern being cumbersome:

vi_mutex.lock();
vi.push_back(1);
vi_mutex.unlock();

In fact, there are helper classes that will make this cleaner, namely lock_guard that will take a mutex to lock in its constructor and unlock on the destructor

{
    lock_guard<mutex> lck(vi_mutex);
    vi.push_back(1);
}

Then other code in practice becomes thread safe ala:

{
     lock_guard<mutex> lck(vi_mutex);
     result = 0;
     if (vi.size() > 0)
     {
         result = vi.at(0);
     }
}

Update:

I wrote a sample program, using your Concurrent class to demonstrate the race condition that leads to a problem. Here's the code:

Concurrent<list<int>> g_list;

void thread1()
{
    while (true)
    {
        if (g_list->size() > 0)
        {
            int value = g_list->front();
            cout << value << endl;
        }
    }

}

void thread2()
{
    int i = 0;
    while (true)
    {
        if (i % 2)
        {
            g_list->push_back(i);
        }
        else
        {
            g_list->clear();
        }
        i++;
    }
}

int main()
{

    std::thread t1(thread1);
    std::thread t2(thread2);

    t1.join(); // run forever

    return 0;
}

In a non-optimized build, the program above crashes in a matter of seconds. (Retail is a bit harder, but the bug is still there).

回答2:

This endeavor is fraught with peril and performance problems. Iterators generally depend on the state of the whole data structure and will usually be invalidated if the data structure changes in certain ways. This means that iterators either need to hold a mutex on the whole data structure when they're created, or you'll need to define a special iterator that carefully locks only the stuff it's depending on in the moment, which is likely more than the state of the node/element it's currently pointing at. And this would require internal knowledge of the implementation of what's being wrapped.

As an example, think about how this sequence of events might play out:

Thread 1:

 void thread1_func(Concurrent<vector<int>> &cq)
 {
       cq.push_back(1);
       cq.push_back(2);
 }

Thread 2:

 void thread2_func(Concurrent<vector<int>> &cq)
 {
       ::std::copy(cq.begin(), cq.end(), ostream_iterator<int>(cout, ", "));
 }

How do you think that would play out? Even if every member function is nicely wrapped in a mutex so they're all serialized and atomic, you're still invoking undefined behavior as one thread changes a data structure another is iterating over.

You could make creating an iterator also lock a mutex. But then, if the same thread creates another iterator, it should be able to grab the mutex, so you'll need to use a recursive mutex.

And, of course, that means that your data structure can't be touched by any other threads while one thread is iterating over it, significantly decreasing concurrency opportunties.

It's also very prone to race conditions. One thread makes a call and discovers some fact about the data structure that it's interested in. Then, assuming this fact is true, it makes another call. But, of course, the fact is no longer true because some other thread has poked it's nose in in between getting the fact and using the fact. The example of using size and then deciding whether or not to iterate over it is just one example.

回答3:

Is using the temporary object to call a function with overloaded operator->() leads to undefined behavior in C++

No. Temporaries are only destroyed at the end of the full expression that made them spring to life. And using a temporary object with an overloaded operator-> to "decorate" member access is exactly why the overloaded operator is defined the way it is. It is used for logging, performance measurement in dedicated builds and, like you self discovered, locking all member accesses to an encapsulated object.

The range based for loop syntax is not working in this case. It gives compilation error. What is the correct way to fix it?

Your Iterator function doesn't return an actual iterator as far as I can tell. Compare Safe<Args...>(std::forward<Args>(args)...); with the argument list Iterator(Class::NAME(), m_Mutex). What is Base when the argument in Args is deduced from Class::NAME()?

Does this small utility class serve the purpose of thread-safety for an encapsulated object in all the cases?

It looks fairly safe for simple value types. But of course that is contingent on all access being done via the wrapper.

For more complex containers, where iterator invalidation comes into consideration, then making a single member access atomic will not necessarily prevent race conditions (as was noted in the comments). I suppose you may create an iterator wrapper that locks the container for the duration of its lifetime... but then you lose most of the useful container API.

回答4:

In addition to the other issues, your assumption about const are also wrong. For many of the stl types, the const methods still require that the container is protected against modification for the duration of execution.

For that you require a shared mutex at least, and it also needs to be declared mutable so that it can be locked in the const path. At that point better be aware that the std::shared_mutex implementations out there also all violate the specification by introducing additional synchronization points due to a premature "exclusive first" scheduling strategy copied from boost. Treat them as a performance optimization with same constraints as std::mutex, don't rely on the specification.

When using const iterators (cbegin, cend) you also must be able to obtain a lock for the entire transaction.

So you require a ScopedLock for const access too.

Same verdict as the other responses, that inline -> directly on Concurrent is a dangerous design choice. Typical pistol aimed straight at your own foot. Pretty much ensured that this will blow when refactoring naively from . to -> operator.

回答5:

I can't resist to answer this, as I have been working on such a utility library for a few months now. Naturally, I think the idea is very good: it leads to much clearer and safer code. To answer the questions:

As already answered: it does not lead to undefined behaviour because the temporary exists for the whole execution of the line of code in which it appears.
Your utility class can be used as universally as std::lock_guard. std::lock_guard is the go-to mechanism in C++11 to provide thread-safety, whatever the objects you are working with.

Many answers point out to possible misuses of your class (the "iterator from a std::vector" example), but I think these are irrelevant. Of course, you must try to limit the possibility of misuse, but you cannot ultimately remove them all. You get the same iterator problem using std::lock_guard anyway, and the purpose of your library is not to eliminate multi-threading mistakes, but to at least remove a few using the type system.

Some issues I see in your code:

The standard library differentiate std::lock_guard and std::unique_lock and I think it is important to keep this distinction. The former for your daily mutex locking, the later to use with std::condition_variable for instance.
You explicitly call lock() and unlock() on the mutex, you prevent the beneficial use of shared mutexes, as those have a lock_shared method for read-only access.
You give access to the encapsulated object through const pointer / const reference. Read-only access still needs the mutex to be locked, because another thread could be modifying the object concurrently: you might be reading partly updated information.
Your class is less flexible than the standard ones. For instance, std::lock_guard can accept an already locked mutex using the std::adopt_lock tag, and this can be very useful.

I'll be happy to point you to my own implementation if you are interested.

来源：https://stackoverflow.com/questions/54781372/how-to-implement-thread-safe-container-with-natural-looking-syntax

标签

c++

templates

thread-safety

c++14

temporary-objects