问题
I am trying to remove both the row i and column i when both the row i and column i contains all 0s. For example in this case we can see that row 0 is all zeros and column 0 is all zeros and thus row and column 0 is removed. Same with row column pair 2 and 4. Row 1 is all zeros but column 1 is not so neither are removed.
[0,0,0,0,0]
[0,1,0,1,0]
[0,0,0,0,0]
[0,0,0,0,0]
[0,0,0,0,0]
would become
[1,1]
[0,0]
Another example:
[0,0,1,0,0,1]
[0,0,0,0,0,0]
[0,0,0,0,0,0]
[0,0,0,0,0,0]
[0,0,0,0,0,0]
[0,0,1,0,1,0]
would change to:
[0,1,0,1]
[0,0,0,0]
[0,0,0,0]
[0,1,1,0]
This is the code that I am using to compute:
def remove(matrix):
for i, x in reversed(list(enumerate(matrix))):
if np.all(matrix == 0, axis=0)[i] and np.all(matrix == 0, axis=1)[i]:
matrix = np.delete(matrix,i,axis=0)
matrix = np.delete(matrix,i,axis=1)
return matrix
After testing this line is taking the most time by far:
if np.all(matrix == 0, axis=0)[i] and np.all(matrix == 0, axis=1)[i]:
Is there a more appropriate way to test a row and column in this way? The matrix that I am using is a sparse binary matrix. I am not using any sparse matrix classes just ndarray.
回答1:
Vectorized approach with masking -
def remove_vectorized(a):
mask = a==0
m_row = ~mask.all(1)
m_col = ~mask.all(0)
comb_mask = m_row | m_col
return a[comb_mask][:,comb_mask] #or a[np.ix_(comb_mask, comb_mask)]
Sample runs
Case #1 :
In [485]: a
Out[485]:
array([[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
In [486]: remove_vectorized(a)
Out[486]:
array([[1, 1],
[0, 0]])
Case #2 :
In [489]: a
Out[489]:
array([[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0]])
In [490]: remove_vectorized(a)
Out[490]:
array([[0, 1, 0, 1],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 1, 1, 0]])
回答2:
One thing you can do- evaluate the boolean matrix "matrix == 0" outside of your for loop, and use that matrix instead of reevaluating it every run.
It'd be something like:
def remove(matrix):
binary_matrix = matrix == 0
for i, x in reversed(list(enumerate(matrix))):
if np.all(binary_matrix , axis=0)[i] and np.all(binary_matrix , axis=1)[i]:
matrix = np.delete(matrix,i,axis=0)
matrix = np.delete(matrix,i,axis=1)
binary_matrix = np.delete(binary_matrix ,i,axis=0)
binary_matrix = np.delete(binary_matrix ,i,axis=1)
return matrix
来源:https://stackoverflow.com/questions/46774681/efficiently-test-matrix-rows-and-columns-with-numpy