filter tensorflow array with specific condition over numpy array

问题

I have a tensorflow array names tf-array and a numpy array names np_array. I want to find specific rows in tf_array with regards to np-array.

    tf-array = tf.constant(
                [[9.968594,  8.655439,  0.,        0.       ],
                 [0.,        8.3356,    0.,        8.8974   ],
                 [0.,        0.,        6.103182,  7.330564 ],
                 [6.609862,  0.,        3.0614321, 0.       ],
                 [9.497023,  0.,        3.8914037, 0.       ],
                 [0.,        8.457685,  8.602337,  0.       ],
                 [0.,        0.,        5.826657,  8.283971 ]])

I also have an np-array:

np_array = np.matrix(
 [[2, 5, 1],
  [1, 6, 4],
  [0, 0, 0],
  [2, 3, 6],
  [4, 2, 4]]

Now I want to keep the elements in tf-array in which the combination of n (here n is 2) of them (index of them) is in the value of np-array. What does it mean?

For example, in tf-array, in the first column, indexes which has value are: (0,3,4). Is there any row in np-array which contains any combination of these two indexes: (0,3), (0,4) or (3,4). Actually, there is no such row. So all the elements in that column became zero.

Indexes for the second column in tf-array is (0,1) (0,5) (1,5). As you see the record (1,5) is available in the np-array in the first row. Thats why we keep those in the tf-array.

So the final result should be like this:

[[0.        0.        0.        0.       ]
 [0.        8.3356    0.        8.8974   ]
 [0.        0.        6.103182  7.330564 ]
 [0.        0.        3.0614321 0.       ]
 [0.        0.        3.8914037 0.       ]
 [0.        8.457685  8.602337  0.       ]
 [0.        0.        5.826657  8.283971 ]]

I am looking for a very efficient approach as I have large number of data.

Update1

I could get this with the below code which is giving True where there is value and the zero mask to false:

[[ True  True False False]
 [False  True False  True]
 [False False  True  True]
 [ True False  True False]
 [ True False  True False]
 [False  True  True False]
 [False False  True  True]]

with tf.Session() as sess:  
 where = tf.not_equal(tf-array, 0.0)
 print(sess.run(where))

But how can I compare theese matrix with np_array?

Thank you in advance!

回答1:

Here is the solution from https://stackoverflow.com/a/56510832/7207392 with necessary modifications. For the sake of simplicity I use np.array for all data. I'm no tensortflow expert, so if translating is not entirely straight forward, you'll have to ask somebody else how to do it.

import numpy as np

def f(a1, a2, n):
    N,M = a1.shape
    a1p = np.concatenate([a1,np.zeros((1,a1.shape[1]),a1.dtype)], axis=0)
    a2 = np.sort(a2, axis=1)
    a2[:,1:][a2[:,1:]==a2[:,:-1]] = N
    y,x = np.where(np.count_nonzero(a1p[a2], axis=1) >= n)
    out = np.zeros_like(a1p)
    out[a2[y],x[:,None]] = a1p[a2[y],x[:,None]]
    return out[:-1]

a1 = np.array(
    [[9.968594,  8.655439,  0.,        0.       ],
     [0.,        8.3356,    0.,        8.8974   ],
     [0.,        0.,        6.103182,  7.330564 ],
     [6.609862,  0.,        3.0614321, 0.       ],
     [9.497023,  0.,        3.8914037, 0.       ],
     [0.,        8.457685,  8.602337,  0.       ],
     [0.,        0.,        5.826657,  8.283971 ]])

a2 = np.array(
 [[2, 5, 1],
  [1, 6, 4],
  [0, 0, 0],
  [2, 3, 6],
  [4, 2, 4]])

print(f(a1,a2,2))

Output:

[[0.        0.        0.        0.       ]
 [0.        8.3356    0.        8.8974   ]
 [0.        0.        6.103182  7.330564 ]
 [0.        0.        3.0614321 0.       ]
 [0.        0.        3.8914037 0.       ]
 [0.        8.457685  8.602337  0.       ]
 [0.        0.        5.826657  8.283971 ]]

回答2:

The one eficient way you can try is to make bit flags for each row what value are there like for (0,3,4) will be 1 <<0 | 1<<3 | 1<<4. You will have array of values with flags.Try if << and | operator work in numpy. Make the same for another array, i guess tf- arrays are just wrapped numpys. After having 2 array of flags, make bitwise "and" over those. Where you condition is true for rows, the result will have at least two non zero bits. Also cound of bits can be done also efficient, google for that.

This hovever wont work with float - you ll need convert those to pretty small ints.

import numpy as np



arr_one =  np.array(
 [[2, 5, 1],
  [1, 6, 4],
  [0, 0, 0],
  [2, 3, 6],
  [4, 2, 4]])

arr_two =  np.array(
 [[2, 0, 7],
  [1, 3, 4],
  [5, 5, 6],
  [1, 3, 6],
  [4, 2, 4]])




print('1 << arr_one.T[0] ' , 1 << arr_one.T[0] )


arr_one_flags = 1 << arr_one.T[0] | 1 << arr_one.T[1] | 1 << arr_one.T[2]

print('arr_one_flags ', arr_one_flags)

arr_two_flags = 1 << arr_two.T[0] | 1 << arr_two.T[1] | 1 << arr_two.T[2]

arr_and = arr_one_flags & arr_two_flags

print('arr_and ', arr_and)



def get_bit_count(value):
    n = 0
    while value:
        n += 1
        value &= value-1
    return n

arr_matches = np.array([get_bit_count(x) for x in arr_and])


print('arr_matches ', arr_matches )


arr_two_filtered = arr_two[arr_matches > 1]

print('arr_two_filtered ', arr_two_filtered )

来源：https://stackoverflow.com/questions/56532577/filter-tensorflow-array-with-specific-condition-over-numpy-array

标签

python

arrays

numpy

tensorflow

slice