Safe writing to variable in cython c wrapper within two python processes or distinct memory for python processes

ⅰ亾dé卋堺 提交于 2021-01-29 09:53:40

问题


I am creating a wrapper over c library that recieves some financial data and I want to collect it into python data type (dict with list of field names and list of lists with financial data fields).

On the c level there is function that starts "listening" to some port and when any event appears some user-defined function is called. This function is written in cython. Simplified example of such function is here:

cdef void default_listener(const event_data_t* data, int data_count, void* user_data):

    cdef trade_t* trades = <trade_t*>data # cast recieved data according to expected type 
    cdef dict py_data = <dict>user_data # cast user_data to initial type(dict in our case)

    for i in range(data_count):
        # append to list in the dict that we passed to the function 
        # fields of recieved struct
        py_data['data'].append([trades[i].price,
                                trades[i].size,
                                ]
                               )

The problem: when there is only one python process with this function started, there are no problems, but if I start another python process and run the same function one of the processes will be terminated in undetermiined amount of time. I suppose that this happens because two functions that are called simultaniously in different processes may try to write to the same part of the memory. May this be the case?

If this is the case, are there any ways to prevent two processes use the same memory? Or maybe some lock can be established before the cython code starts to write?

P.S.: I also have read this article and according to it for each python process there is some memory allocated that does not intersect with parts for other processes. But it is unclear for me, is this allocated memory also available for underlying c functions or these functions have acces to another fields that may intersect


回答1:


I'm taking a guess at the answer based on your comment - if it's wrong then I'll delete it, but I think it's likely enough to be right to be worth posting as an answer.

Python has a locking mechanism known as the Global Interpreter Lock (or GIL). This ensures that multiple threads don't attempt to access the same memory simultaneously (including memory internal to Python, that may not be obvious to the user).

Your Cython code will be working on the assumption that its thread holds the GIL. I strongly suspect that this isn't true, and so performing any operations on a Python object will likely cause a crash. One way to deal with this would be to follow this section of documentation in the C code that calls the Cython code. However, I suspect it's easier to handle in Cython.

First tell Cython that the function is "nogil" - it does not require the GIL:

cdef void default_listener(const event_data_t* data, int data_count, void* user_data) nogil:

If you try to compile now it will fail, since you use Python types within the function. To fix this, claim the GIL within your Cython code.

cdef void default_listener(...) nogil:
    with gil:
        default_listener_impl(...)

What I've done is put the implementation in a separate function that does require the GIL (i.e. doesn't have a nogil attached). The reason for this is that you can't put cdef statements in the with gil section (as you say in your comment) - they have to be outside it. However, you can't put cdef dict outside it, because it's a Python object. Therefore a separate function is the easiest solution. The separate function looks almost exactly like default_listener does now.


It's worth knowing that this isn't a complete locking mechanism - it's really only to protect the Python internals from being corrupted - an ordinary Python thread will release and regain the GIL periodically automatically, and that may be while you're "during" an operation. Cython won't release the GIL unless you tell it to (in this case, at the end of the with gil: block) so does hold an exclusive lock during this time. If you need finer control of locking then you may want to look at either the multithreading library, or wrapping some C locking library.



来源:https://stackoverflow.com/questions/57805481/safe-writing-to-variable-in-cython-c-wrapper-within-two-python-processes-or-dist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!