Embedding Python interpreter in a C/C++ application is well documented. What is the best approach to run multiple python interpreter on multiple operating system threads (i.
It's not exactly an answer to your question, but you could use separate processes instead of threads, then the problems should vanish.
Pros:
Cons:
If you use shared memory for IPC, your resulting application code shouldn't differ too much from what you'd get with threads.
Given that some people are arguing you should always use processes over threads, I'd at least consider it as an alternative if it fits your constraints in any way.