UPDATE Well, it looks like adding PyEval_InitThreads() before the call to PyGILState_Ensure() does the trick. In my haste to figure things out I incorrectly
Python expects a certain amount of initialization to be done by the main thread before anything attempts to call back in from a subthread.
If the main thread is an application that is embedding Python, then it should call PyEval_InitThreads() immediately after calling Py_Initialize().
If the main thread is instead the Python interpreter itself (as seems to be the case here), then the module using the multithreaded extension module should include an "import threading" early to ensure that PyEval_InitThreads() is called correctly before any subthreads are spawned.