This is a follow up to this question, where I didn\'t get any input on this point. Here is the brief question:
Is it possible to detect and debug problems co
The best thing I can think of is to stay away from multi-threaded code whenever possible. It seems there are very few programmers who can write bug free multi threaded applications and I would argue that there are no coders beeing able to write bug free large multi threaded applications.
Narrow down on the functions that are being called, and rule out what could and could not be to blame. When you find sections of code that you suspect may be causing the issue, add lots of detailed logging / tracing to it. Once the issue occurs again, inspect the logs to see how the code executed differently than it does in "baseline" situations.
If you are using Visual Studio, you can also set breakpoints and use the Parallel Stacks window. Parallel Stacks is a huge help when debugging concurrent code, and will give you the ability to switch between threads to debug them independently. More info-
https://docs.microsoft.com/en-us/visualstudio/debugger/using-the-parallel-stacks-window?view=vs-2019
https://docs.microsoft.com/en-us/visualstudio/debugger/walkthrough-debugging-a-parallel-application?view=vs-2019
Threading/concurrency problems are notoriously difficult to replicate - which is one of the reasons why you should design to avoid or at least minimize the probabilities. This is the reason immutable objects are so valuable. Try to isolate mutable objects to a single thread, and then carefully control the exchange of mutable objects between threads. Attempt to program with a design of object hand-over, rather than "shared" objects. For the latter, use fully synchronized control objects (which are easier to reason about), and avoid having a synchronized object utilize other objects which must also be synchronized - that is, try to keep them self contained. Your best defense is a good design.
Deadlocks are the easiest to debug, if you can get a stack trace when deadlocked. Given the trace, most of which do deadlock detection, it's easy to pinpoint the reason and then reason about the code as to why and how to fix it. With deadlocks, it always going to be a problem acquiring the same locks in different orders.
Live locks are harder - being able to observe the system while in the error state is your best bet there.
Race conditions tend to be extremely difficult to replicate, and are even harder to identify from manual code review. With these, the path I usually take, besides extensive testing to replicate, is to reason about the possibilities, and try to log information to prove or disprove theories. If you have direct evidence of state corruption you may be able to reason about the possible causes based on the corruption.
The more complex the system, the harder it is to find concurrency errors, and to reason about it's behavior. Make use of tools like JVisualVM and remote connect profilers - they can be a life saver if you can connect to a system in an error state and inspect the threads and objects.
Also, beware the differences in possible behavior which are dependent on the number of CPU cores, pipelines, bus bandwidth, etc. Changes in hardware can affect your ability to replicate the problem. Some problems will only show on single-core CPU's others only on multi-cores.
One last thing, try to use concurrency objects distributed with the system libraries - e.g in Java java.util.concurrent
is your friend. Writing your own concurrency control objects is hard and fraught with danger; leave it to the experts, if you have a choice.
Visual Studio allows you to inspect the call stack of each thread, and you can switch between them. It is by no means enough to track all kinds of threading issues, but it is a start. A lot of improvements for multi-threaded debugging is planned for the upcoming VS2010.
I have used WinDbg + SoS for threading issues in .NET code. You can inspect locks (sync blokcs), thread call stacks etc.
Develop code the way that Princess recommended for your other question (Immutable objects, and Erlang-style message passing). It will be easier to detect multi-threading problems, because the interactions between threads will be well defined.