Basically there seems to be massive confusion/ambiguity over when exactly PyEval_InitThreads() is supposed to be called, and what accompanying API
To quote above:
The short answer: you shouldn't care about releasing the GIL after calling PyEval_InitThreads...
Now, for a longer answer:
I'm limiting my answer to be about Python extensions (as opposed to embedding Python). If we are only extending Python, than any entry point into your module is from Python. This by definition means that we don't have to worry about calling a function from a non-Python context, which makes things a bit simpler.
If threads have NOT be initialized, then we know there is no GIL (no threads == no need for locking), and thus "It is not safe to call this function when it is unknown which thread (if any) currently has the global interpreter lock" does not apply.
if (!PyEval_ThreadsInitialized())
{
PyEval_InitThreads();
}
After calling PyEval_InitThreads(), a GIL is created and assigned... to our thread, which is the thread currently running Python code. So all is good.
Now, as far as our own launched worker "C"-threads, they will need to ask for the GIL before running relevant code: so their common methodology is as follows:
// Do only non-Python things up to this point
PyGILState_STATE state = PyGILState_Ensure();
// Do Python-things here, like PyRun_SimpleString(...)
PyGILState_Release(state);
// ... and now back to doing only non-Python things
We don't have to worry about deadlock any more than normal usage of extensions. When we entered our function, we had control over Python, so either we were not using threads (thus, no GIL), or the GIL was already assigned to us. When we give control back to the Python run-time by exiting our function, the normal processing loop will check the GIL and hand control of as appropriate to other requesting objects: including our worker threads via PyGILState_Ensure().
All of this the reader probably already knows. However, the "proof is in the pudding". I've posted a very-minimally-documented example that I wrote today to learn for myself what the behavior actually was, and that things work properly. Sample Source Code on GitHub
I was learning several things with the example, including CMake integration with Python development, SWIG integration with both of the above, and Python behaviors with extensions and threads. Still, the core of the example allows you to:
... and all of this without any crashes or segfaults. At least on my system (Ubuntu Linux w/ GCC).