np.concatenate a ND tensor/array with a 1D array

前端 未结 4 1656
南笙
南笙 2021-01-14 09:33

I have two arrays a & b

a.shape
(5, 4, 3)
array([[[ 0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ],
        [ 0.          


        
4条回答
  •  感动是毒
    2021-01-14 10:03

    Simply broadcast b to 3D and then concatenate along second axis -

    b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    out = np.concatenate((a,b3D),axis=1)
    

    The broadcasting part with np.broadcast_to doesn't actual replicate or make copies and is simply a replicated view and then in the next step, we do the concatenation that does the replication on-the-fly.

    Benchmarking

    We are comparing np.repeat version from @cᴏʟᴅsᴘᴇᴇᴅ's solution against np.broadcast_to one in this section with focus on performance. The broadcasting based one does the replication and concatenation in the second step, as a merged command so to speak, while np.repeat version makes copy and then concatenates in two separate steps.

    Timing the approaches as whole :

    Case #1 : a = (500,400,300) and b = (300,)

    In [321]: a = np.random.rand(500,400,300)
    
    In [322]: b = np.random.rand(300)
    
    In [323]: %%timeit
         ...: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
         ...: r = np.concatenate((a, b3D), axis=1)
    10 loops, best of 3: 72.1 ms per loop
    
    In [325]: %%timeit
         ...: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
         ...: out = np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 72.5 ms per loop
    

    For smaller input shapes, call to np.broadcast_to would take a bit longer than np.repeat given the work needed for setting up the broadcasting is apparently more complicated, as the timings suggest below :

    In [360]: a = np.random.rand(5,4,3)
    
    In [361]: b = np.random.rand(3)
    
    In [366]: %timeit np.broadcast_to(b,(a.shape[0],1,len(b)))
    100000 loops, best of 3: 3.12 µs per loop
    
    In [367]: %timeit b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    1000000 loops, best of 3: 957 ns per loop
    

    But, the broadcasting part would have a constant time irrepective of the shapes of the inputs, i.e. the 3 u-sec part would stay around that mark. The timing for the counterpart : b.reshape(1, 1, -1).repeat(a.shape[0], axis=0) would depend on the input shapes. So, let's dig deeper and see how the concatenation steps for the two approaches fair/behave.

    Diging deeper

    Trying to dig deeper to see how much the concatenation part is consuming :

    In [353]: a = np.random.rand(500,400,300)
    
    In [354]: b = np.random.rand(300)
    
    In [355]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    
    In [356]: %timeit np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 72 ms per loop
    
    In [357]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    
    In [358]: %timeit np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 72 ms per loop
    

    Conclusion : Doesn't seem too different.

    Now, let's try a case where the replication needed for b is a bigger number and b has noticeably high number of elements as well.

    In [344]: a = np.random.rand(10000, 10, 1000)
    
    In [345]: b = np.random.rand(1000)
    
    In [346]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    
    In [347]: %timeit np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 130 ms per loop
    
    In [348]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    
    In [349]: %timeit np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 141 ms per loop
    

    Conclusion : Seems like the merged concatenate+replication with np.broadcast_to is doing a bit better here.

    Let's try the original case of (5,4,3) shape :

    In [360]: a = np.random.rand(5,4,3)
    
    In [361]: b = np.random.rand(3)
    
    In [362]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    
    In [363]: %timeit np.concatenate((a,b3D),axis=1)
    1000000 loops, best of 3: 948 ns per loop
    
    In [364]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    
    In [365]: %timeit np.concatenate((a,b3D),axis=1)
    1000000 loops, best of 3: 950 ns per loop
    

    Conclusion : Again, not too different.

    So, the final conclusion is that if there are a lot of elements in b and if the first axis of a is also a big number (as the replication number is that one), np.broadcast_to would be a good option, otherwise np.repeat based version takes care of the other cases pretty well.

提交回复
热议问题