Check if two numpy arrays are identical

前端未结

关注

 7  993

Suppose I have a bunch of arrays, including x and y, and I want to check if they\'re equal. Generally, I can just use np.all(x == y) (

相关标签:

7条回答

遥遥无期

2020-12-10 16:26

Until this is implemented in numpy natively you can write your own function and jit-compile it with numba:

import numpy as np
import numba as nb


@nb.jit(nopython=True)
def arrays_equal(a, b):
    if a.shape != b.shape:
        return False
    for ai, bi in zip(a.flat, b.flat):
        if ai != bi:
            return False
    return True


a = np.random.rand(10, 20, 30)
b = np.random.rand(10, 20, 30)


%timeit np.all(a==b)  # 100000 loops, best of 3: 9.82 µs per loop
%timeit arrays_equal(a, a)  # 100000 loops, best of 3: 9.89 µs per loop
%timeit arrays_equal(a, b)  # 100000 loops, best of 3: 691 ns per loop

Worst case performance (arrays equal) is equivalent to np.all and in case of early stopping the compiled function has the potential to outperform np.all a lot.

0 讨论(0)

借酒劲吻你

2020-12-10 16:34

Adding short-circuit logic to array comparisons is apparently being discussed on the numpy page on github, and will thus presumably be available in a future version of numpy.

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-10 16:38

Well, not really an answer as I haven't checked if it break-circuits, but:

assert_array_equal.

From the documentation:

Raises an AssertionError if two array_like objects are not equal.

Try Except it if not on a performance sensitive code path.

Or follow the underlying source code, maybe it's efficient.

0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-12-10 16:43

As Thomas Kühn wrote in a comment to your post, array_equal is a function which should solve the problem. It is described in Numpy's API reference.

0 讨论(0)
发布评论:

提交评论
- 加载中...

无人共我

2020-12-10 16:45

You could iterate all elements of the arrays and check if they are equal. If the arrays are most likely not equal it will return much faster than the .all function. Something like this:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([1, 3, 4])

areEqual = True

for x in range(0, a.size-1):
        if a[x] != b[x]:
                areEqual = False
                break
        else:
               print "a[x] is equal to b[x]\n"

if areEqual:
        print "The tables are equal\n"
else:
        print "The tables are not equal\n"

0 讨论(0)

小蘑菇

2020-12-10 16:47

Probably someone who understands the underlying data structure could optimize this or explain whether it's reliable/safe/good practice, but it seems to work.

np.all(a==b)
Out[]: True

memoryview(a.data)==memoryview(b.data)
Out[]: True

%timeit np.all(a==b)
The slowest run took 10.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.2 µs per loop

%timeit memoryview(a.data)==memoryview(b.data)
The slowest run took 8.55 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop

If I understand this correctly, ndarray.data creates a pointer to the data buffer and memoryview creates a native python type that can be short-circuited out of the buffer.

I think.

EDIT: further testing shows it may not be as big a time-improvement as shown. previously a=b=np.eye(5)

a=np.random.randint(0,10,(100,100))

b=a.copy()

%timeit np.all(a==b)
The slowest run took 6.70 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 17.7 µs per loop

%timeit memoryview(a.data)==memoryview(b.data)
10000 loops, best of 3: 30.1 µs per loop

np.all(a==b)
Out[]: True

memoryview(a.data)==memoryview(b.data)
Out[]: True

0 讨论(0)

1 2 下一页