I have a piece of mature geospatial software that has recently had areas rewritten to take better advantage of the multiple processors available in modern PCs. Specifically
Not really an answer:
Testing multithreaded bugs is very difficult. Most bugs only show up if two (or more) threads go to specific places in code in a specific order. If and when this condition is met may depend on the timing of the process running. This timing may change due to one of the following pre-conditions:
There are for sure more pre-conditions that I forgot.
Because MT-bugs so highly depend on the exact timing of the code running Heisenberg's "Uncertainty principle" comes in here: If you want to test for MT bugs you change the timing by your "measures" which may prevent the bug from occurring...
The timing thing is what makes MT bugs so highly non-deterministic. In other words: You may have a software that runs for months and then crashes some day and after that may run for years. If you don't have some debug logs/core dumps etc. you may never know why it crashes.
So my conclusion is: There is no really good way to Unit-Test for thread-safety. You always have to keep your eyes open when programming.
To make this clear I will give you a (simplified) example from real life (I encountered this when changing my employer and looking on the existing code there):
Imagine you have a class. You want that class to automatically deleted if no-one uses it anymore. So you build a reference-counter into that class: (I know it is a bad style to delete an instance of a class in one of it's methods. This is because of the simplification of the real code which uses a Ref class to handle counted references.)
class A {
private:
int refcount;
public:
A() : refcount(0) {
}
void Ref() {
refcount++;
}
void Release() {
refcount--;
if (refcount == 0) {
delete this;
}
}
};
This seams pretty simple and nothing to worry about. But this is not thread-safe! It's because "refcount++" and "refcount--" are not atomic operations but both are three operations:
Each of those operations can be interrupted and another thread may, at the same time manipulate the same refcount. So if for example two threads want to incremenet refcount the following COULD happen:
So the result is: refcount = 9 but it should have been 10!
This can only be solved by using atomic operations (i.e. InterlockedIncrement() & InterlockedDecrement() on Windows).
This bug is simply untestable! The reason is that it is so highly unlikely that there are two threads at the same time trying to modify the refcount of the same instance and that there are context switches in between that code.
But it can happen! (The probability increases if you have a multi-processor or multi-core system because there is no context switch needed to make it happen). It will happen in some days, weeks or months!