Finding Non-Zero Values/Indexes in Numpy

问题

I have a quite big numpy array with the shape of (12388, 4). The first two values are coordiantes and the second two key values. Some of the are zero. I want to filter through the array and find all indexes where both of the second two values are non-zeros. My code looks like this:

slice_index,_ = np.where((slice[:,2:4]!=0))
slice_nonzero_values = slice[slice_index]

The shape of the resulting array slice_nonzero_values is (18550,4). Thus something must have gone wrong, as the resulting array is bigger as the original one. Looking at the csv I realized np.where gives me the same index back multiple times if slice[:,2] and slice[:,3] are both non-zero. Thus I tried includenp.unique:

slice_index,_ = np.where((slice[:,2:4]!=0))
slice_index_unique = np.unique(slice_index)
slice_nonzero_values = slice[slice_index_unique]

This results in a shape of (9669, 4). This looks a lot better. However, to be sure everything is fine now, I made this for-loop:

    test = []
    test_index = []
    for index, i in enumerate(slice):
        if i[2]!=0 or i[3]!=0:
            test.append(i)
            test_index.append(index)
    test = np.array(test)
    test_index = np.array(test_index)

This loop results in the array test with the shape of (8881, 4). Now I am completely confused which of the two ways is correct. Based on the loop`s logic, the test array must be the rigth one. However, this is just once slice array of literally thousends. I cant leave the for-loop there. To summerize: I want to filter through the slice-array and get all entries which have non-zeros in either of the last two columns. In other words, if both values (slice[:,2] and slice[:,3]) are zero, the row gets out. If only one of them is zero and the other not, that´s fine.

Here is a sample of the slice array:

   array([[0.01032591, 0. , 0.               , 0.        ],
   [0.03256559, 0.00890732, 5.0000000e+00    , 0.        ],
   [0.0468626 , 0.01543951, 0.               , 0.        ],
   ...,
   [0.13899946, 0.8847985 , 0.               , 0.        ],
   [0.13899946, 0.8847985 , 4.0000000e+00    , 5.3900000e+02],
   [0.13899946, 0.8847985 , 0.               , 0.        ]], dtype=float32)

回答1:

Here's a working demo. Create test data:

import numpy as np

X = np.random.rand(10,4)
X = np.vstack([X, np.zeros([2,4])])

>>> X
array([[0.09889965, 0.01169015, 0.30886119, 0.40204571],
       [0.67277149, 0.01654403, 0.17710642, 0.54201684],
       # ...
       [0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        ]])

Find vectors the last two numbers are none zero:

idx = np.where(np.sum(X[:,2:], axis=1) != 0)[0]

# alternatively, use np.any
idx = np.where(np.any(X[:,2:], axis=1))[0]

Retrieve filtered vectors:

X_none_zeros = X[idx]

>>> X_none_zeros
array([[0.09889965, 0.01169015, 0.30886119, 0.40204571],
       # ...
       [0.78279739, 0.84191242, 0.31685306, 0.54906034]])

>>> X_none_zeros.shape
(10, 4)

>>> X.shape
(12, 4)

Explain: the actual codes are just two lines:

# retrieve last 2 numbers for each vector in X
# and sum each vector horizontally, so you have 
# [s1, s2, s3, ...]
# use the condition to filter indexes
idx = np.where(np.sum(X[:,2:], axis=1) != 0)[0]
# retrieve matched vectors accordingly
X_none_zeros = X[idx]

来源：https://stackoverflow.com/questions/56391656/finding-non-zero-values-indexes-in-numpy

标签

python

python-3.x

numpy