What is the difference between numpy.shares_memory and numpy.may_share_memory?

后端 未结 2 557
旧时难觅i
旧时难觅i 2021-01-02 04:06

Why does numpy.may_share_memory exist?
What is the challenge to give an exact result?

Is numpy.may_share_memory deprecated method?

2条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-02 04:35

    Quoting the release notes for 1.11.0:

    A new function np.shares_memory that can check exactly whether two arrays have memory overlap is added. np.may_share_memory also now has an option to spend more effort to reduce false positives.

    Semantically, this suggests that the older may_share_memory test was designed to get a loose guess whether memory is shared between the arrays. If surely not, then one could proceed accordingly. If there was a positive test (possibly a false positive), care had to be taken. The new shares_memory function, on the other hand, allows exact checks. This takes more computational time, but can be beneficial in the long run, since free of false positives one can use more possible optimizations. The looser check of may_share_memory probably only guarantees to not return false negatives.

    In terms of the documentation of may_share_memory and shares_memory, both have a keyword argument that tells numpy how strict a check the user wants.

    may_share_memory:

    max_work : int, optional
    
        Effort to spend on solving the overlap problem. See shares_memory for details. Default for may_share_memory is to do a bounds check.
    

    shares_memory:

    max_work : int, optional
    
        Effort to spend on solving the overlap problem (maximum number of candidate solutions to consider). The following special values are recognized:
    
        max_work=MAY_SHARE_EXACT (default)
    
            The problem is solved exactly. In this case, the function returns True only if there is an element shared between the arrays.
        max_work=MAY_SHARE_BOUNDS
    
            Only the memory bounds of a and b are checked.
    

    Judging by the docs, this suggests that the two functions might call the same underlying machinery, but may_share_memory uses a less strict default setting for the check.

    Let's take a peek at the implementation:

    static PyObject *
    array_shares_memory(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
    {
        return array_shares_memory_impl(args, kwds, NPY_MAY_SHARE_EXACT, 1);
    }
    
    
    static PyObject *
    array_may_share_memory(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
    {
        return array_shares_memory_impl(args, kwds, NPY_MAY_SHARE_BOUNDS, 0);
    }
    

    calling the same underlying function with signature

    static PyObject *
    array_shares_memory_impl(PyObject *args, PyObject *kwds, Py_ssize_t default_max_work,
                             int raise_exceptions)
    {}
    

    Without delving deeper into the source, it seems to me that shares_memory is an improvement over may_share_memory, which can give the same loose check as the latter with the appropriate keyword arguments. The older function can be used for convenience and backward compatibility.

    Disclaimer: this is the first time I looked at this part of the source, and I didn't investigate further into array_shares_memory_impl, so my impression can be simply wrong.


    As for a specific example for the difference between the two methods (called with default arguments): it is explained at the above links that may_share_memory only checks array bound indices. If they are disjoint for the two arrays, then there's no chance that they can share memory. But if they are not disjoint, the arrays can still be independent!

    Simple example: a disjoint partitioning of a contiguous block of memory via slicing:

    >>> import numpy as np
    >>> v = np.arange(6)
    >>> x = v[::2]
    >>> y = v[1::2]
    >>> np.may_share_memory(x,y)
    True
    >>> np.shares_memory(x,y)
    False
    >>> np.may_share_memory(x,y,max_work=np.MAY_SHARE_EXACT)
    False
    

    As you can see, x and y are two disjoint slices of the same array. Thus their data ranges largely overlap (they are almost the same, save a single integer in memory). However, none of their elements are actually the same: one contains the even, the other the odd elements of the original contiguous block. So may_share_memory correctly asserts that the arrays may share memory, but on a stricter check it turns out that they don't.


    As for the additional difficulty of computing the overlap exactly, the work can be traced down to the worker called solve_may_share_memory, which also contains a lot of helpful comments about what's going on. In a nutshell, there's

    1. a quick check and return if the bounds don't overlap, otherwise
    2. a return with MEM_OVERLAP_TOO_HARD if we asked for loose checking (i.e. may_share_memory with default args), which is handled on the calling side as "we don't know, so return True"
    3. otherwise we actually solve the Diophantine equations that the problem maps to starting here

    So the work in point 3 above is what needs to additionally be done by shares_memory (or generally, a strict checking case).

提交回复
热议问题