Find the set difference between two large arrays (matrices) in Python

前端 未结 3 1831
天命终不由人
天命终不由人 2020-12-06 16:54

I have two large 2-d arrays and I\'d like to find their set difference taking their rows as elements. In Matlab, the code for this would be setdiff(A,B,\'rows\')

3条回答
  •  无人及你
    2020-12-06 17:43

    This should work, but is currently broken in 1.6.1 due to an unavailable mergesort for the view being created. It works in the pre-release 1.7.0 version. This should be the fastest way possible, since the views don't have to copy any memory:

    >>> import numpy as np
    >>> a1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
    >>> a2 = np.array([[4,5,6],[7,8,9],[1,1,1]])
    >>> a1_rows = a1.view([('', a1.dtype)] * a1.shape[1])
    >>> a2_rows = a2.view([('', a2.dtype)] * a2.shape[1])
    >>> np.setdiff1d(a1_rows, a2_rows).view(a1.dtype).reshape(-1, a1.shape[1])
    array([[1, 2, 3]])
    

    You can do this in Python, but it might be slow:

    >>> import numpy as np
    >>> a1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
    >>> a2 = np.array([[4,5,6],[7,8,9],[1,1,1]])
    >>> a1_rows = set(map(tuple, a1))
    >>> a2_rows = set(map(tuple, a2))
    >>> a1_rows.difference(a2_rows)
    set([(1, 2, 3)])
    

提交回复
热议问题