np.concatenate a ND tensor/array with a 1D array

前端未结

关注

 4  1656

南笙 2021-01-14 09:33

I have two arrays a & b

a.shape
(5, 4, 3)
array([[[ 0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ],
        [ 0.


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   感动是毒
                                             
                
                
                (楼主)
            
              
              
                2021-01-14 10:03
              

            
            
                        
Simply broadcast b to 3D and then concatenate along second axis -

b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
out = np.concatenate((a,b3D),axis=1)


The broadcasting part with np.broadcast_to doesn't actual replicate or make copies and is simply a replicated view and then in the next step, we do the concatenation that does the replication on-the-fly.

Benchmarking

We are comparing np.repeat version from @cᴏʟᴅsᴘᴇᴇᴅ's solution against np.broadcast_to one 
 in this section with focus on performance. The broadcasting based one does the replication and concatenation in the second step, as a merged command so to speak, while np.repeat version makes copy and then concatenates in two separate steps.

Timing the approaches as whole :

Case #1 : a = (500,400,300) and b = (300,)

In [321]: a = np.random.rand(500,400,300)

In [322]: b = np.random.rand(300)

In [323]: %%timeit
     ...: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
     ...: r = np.concatenate((a, b3D), axis=1)
10 loops, best of 3: 72.1 ms per loop

In [325]: %%timeit
     ...: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
     ...: out = np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 72.5 ms per loop


For smaller input shapes, call to np.broadcast_to would take a bit longer than np.repeat given the work needed for setting up the broadcasting is apparently more complicated, as the timings suggest below :

In [360]: a = np.random.rand(5,4,3)

In [361]: b = np.random.rand(3)

In [366]: %timeit np.broadcast_to(b,(a.shape[0],1,len(b)))
100000 loops, best of 3: 3.12 µs per loop

In [367]: %timeit b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
1000000 loops, best of 3: 957 ns per loop


But, the broadcasting part would have a constant time irrepective of the shapes of the inputs, i.e. the 3 u-sec part would stay around that mark. The timing for the counterpart : b.reshape(1, 1, -1).repeat(a.shape[0], axis=0) would depend on the input shapes. So, let's dig deeper and see how the concatenation steps for the two approaches fair/behave. 

Diging deeper

Trying to dig deeper to see how much the concatenation part is consuming :

In [353]: a = np.random.rand(500,400,300)

In [354]: b = np.random.rand(300)

In [355]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))

In [356]: %timeit np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 72 ms per loop

In [357]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)

In [358]: %timeit np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 72 ms per loop


Conclusion : Doesn't seem too different.

Now, let's try a case where the replication needed for b is a bigger number and b has noticeably high number of elements as well.

In [344]: a = np.random.rand(10000, 10, 1000)

In [345]: b = np.random.rand(1000)

In [346]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))

In [347]: %timeit np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 130 ms per loop

In [348]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)

In [349]: %timeit np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 141 ms per loop


Conclusion : Seems like the merged concatenate+replication with np.broadcast_to is doing a bit better here.

Let's try the original case of (5,4,3) shape :

In [360]: a = np.random.rand(5,4,3)

In [361]: b = np.random.rand(3)

In [362]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))

In [363]: %timeit np.concatenate((a,b3D),axis=1)
1000000 loops, best of 3: 948 ns per loop

In [364]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)

In [365]: %timeit np.concatenate((a,b3D),axis=1)
1000000 loops, best of 3: 950 ns per loop


Conclusion : Again, not too different.

So, the final conclusion is that if there are a lot of elements in b and if the first axis of a is also a big number (as the replication number is that one), np.broadcast_to would be a good option, otherwise np.repeat based version takes care of the other cases pretty well.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复