Removing elements from an array that are in another array

后端未结

关注

 5  922

抹茶落季 2020-11-30 04:06

Say I have these 2D arrays A and B.

How can I remove elements from A that are in B. (Complement in set theory: A-B)

A=np.asarray([[1,1,1], [1,1,2], [


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   夕颜
                                             
                
                
                (楼主)
            
              
              
                2020-11-30 04:50
              

            
            
                        
Here is a Numpythonic approach with broadcasting:

In [83]: A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
Out[83]: 
array([[1, 1, 2],
       [1, 1, 3]])


Here is a timeit with other answer:

In [90]: def cal_diff(A, B):
   ....:     A_rows = A.view([('', A.dtype)] * A.shape[1])
   ....:     B_rows = B.view([('', B.dtype)] * B.shape[1])
   ....:     return np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])
   ....: 

In [93]: %timeit cal_diff(A, B)
10000 loops, best of 3: 54.1 µs per loop

In [94]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100000 loops, best of 3: 9.41 µs per loop

# Even better with Divakar's suggestion
In [97]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100000 loops, best of 3: 7.41 µs per loop


Well, if you are looking for a faster way you should looking for ways that reduce the number of comparisons. In this case (without considering the order) you can generate a unique number from your rows and compare the numbers which can be done with summing the items power of two.

Here is the benchmark with Divakar's in1d approach:

In [144]: def in1d_approach(A,B):
   .....:         dims = np.maximum(B.max(0),A.max(0))+1
   .....:         return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
   .....:                          np.ravel_multi_index(B.T,dims))]
   .....: 

In [146]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 23.8 µs per loop

In [145]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 20.2 µs per loop


You can use np.diff to get the an order independent result:

In [194]: B=np.array([[0, 0, 0,], [1, 0, 2,], [1, 0, 3,], [1, 0, 4,], [1, 1, 0,], [1, 1, 1,], [1, 1, 4,], [4, 1, 1]])

In [195]: A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
Out[195]: 
array([[1, 1, 2],
       [1, 1, 3]])

In [196]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 30.7 µs per loop


Benchmark with Divakar's setup:

In [198]: B = np.random.randint(0,9,(1000,3))

In [199]: A = np.random.randint(0,9,(100,3))

In [200]: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)

In [201]: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)

In [202]: A[A_idx] = B[B_idx]

In [203]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 137 µs per loop

In [204]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 112 µs per loop

In [205]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 115 µs per loop


Timing with larger arrays (Divakar's solution is slightly faster):

In [231]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
1000 loops, best of 3: 1.01 ms per loop

In [232]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
1000 loops, best of 3: 880 µs per loop

In [233]:  %timeit in1d_approach(A, B)
1000 loops, best of 3: 807 µs per loop

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复