Check if two numpy arrays are identical

前端 未结 7 986
轻奢々
轻奢々 2020-12-10 15:47

Suppose I have a bunch of arrays, including x and y, and I want to check if they\'re equal. Generally, I can just use np.all(x == y) (

相关标签:
7条回答
  • 2020-12-10 16:26

    Until this is implemented in numpy natively you can write your own function and jit-compile it with numba:

    import numpy as np
    import numba as nb
    
    
    @nb.jit(nopython=True)
    def arrays_equal(a, b):
        if a.shape != b.shape:
            return False
        for ai, bi in zip(a.flat, b.flat):
            if ai != bi:
                return False
        return True
    
    
    a = np.random.rand(10, 20, 30)
    b = np.random.rand(10, 20, 30)
    
    
    %timeit np.all(a==b)  # 100000 loops, best of 3: 9.82 µs per loop
    %timeit arrays_equal(a, a)  # 100000 loops, best of 3: 9.89 µs per loop
    %timeit arrays_equal(a, b)  # 100000 loops, best of 3: 691 ns per loop
    

    Worst case performance (arrays equal) is equivalent to np.all and in case of early stopping the compiled function has the potential to outperform np.all a lot.

    0 讨论(0)
  • 2020-12-10 16:34

    Adding short-circuit logic to array comparisons is apparently being discussed on the numpy page on github, and will thus presumably be available in a future version of numpy.

    0 讨论(0)
  • 2020-12-10 16:38

    Well, not really an answer as I haven't checked if it break-circuits, but:

    assert_array_equal.

    From the documentation:

    Raises an AssertionError if two array_like objects are not equal.

    Try Except it if not on a performance sensitive code path.

    Or follow the underlying source code, maybe it's efficient.

    0 讨论(0)
  • 2020-12-10 16:43

    As Thomas Kühn wrote in a comment to your post, array_equal is a function which should solve the problem. It is described in Numpy's API reference.

    0 讨论(0)
  • 2020-12-10 16:45

    You could iterate all elements of the arrays and check if they are equal. If the arrays are most likely not equal it will return much faster than the .all function. Something like this:

    import numpy as np
    
    a = np.array([1, 2, 3])
    b = np.array([1, 3, 4])
    
    areEqual = True
    
    for x in range(0, a.size-1):
            if a[x] != b[x]:
                    areEqual = False
                    break
            else:
                   print "a[x] is equal to b[x]\n"
    
    if areEqual:
            print "The tables are equal\n"
    else:
            print "The tables are not equal\n"
    
    0 讨论(0)
  • 2020-12-10 16:47

    Probably someone who understands the underlying data structure could optimize this or explain whether it's reliable/safe/good practice, but it seems to work.

    np.all(a==b)
    Out[]: True
    
    memoryview(a.data)==memoryview(b.data)
    Out[]: True
    
    %timeit np.all(a==b)
    The slowest run took 10.82 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 6.2 µs per loop
    
    %timeit memoryview(a.data)==memoryview(b.data)
    The slowest run took 8.55 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 1.85 µs per loop
    

    If I understand this correctly, ndarray.data creates a pointer to the data buffer and memoryview creates a native python type that can be short-circuited out of the buffer.

    I think.

    EDIT: further testing shows it may not be as big a time-improvement as shown. previously a=b=np.eye(5)

    a=np.random.randint(0,10,(100,100))
    
    b=a.copy()
    
    %timeit np.all(a==b)
    The slowest run took 6.70 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 17.7 µs per loop
    
    %timeit memoryview(a.data)==memoryview(b.data)
    10000 loops, best of 3: 30.1 µs per loop
    
    np.all(a==b)
    Out[]: True
    
    memoryview(a.data)==memoryview(b.data)
    Out[]: True
    
    0 讨论(0)
提交回复
热议问题