Build numpy array with multiple custom index ranges without explicit loop

前端 未结 4 604
北海茫月
北海茫月 2020-12-05 21:45

In Numpy, is there a pythonic way to create array3 with custom ranges from array1 and array2 without a loop? The straightforward solution of iterating over the ranges works

4条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-05 22:23

    Assuming the ranges do not overlap, you could build a mask which is nonzero where the index is between the ranges specified by array1 and array2 and then use np.flatnonzero to obtain an array of indices -- the desired array3:

    import numpy as np
    
    array1 = np.array([10, 65, 200]) 
    array2 = np.array([14, 70, 204])
    
    first, last = array1.min(), array2.max()
    array3 = np.zeros(last-first+1, dtype='i1')
    array3[array1-first] = 1
    array3[array2-first] = -1
    array3 = np.flatnonzero(array3.cumsum())+first
    print(array3)
    

    yields

    [ 10  11  12  13  65  66  67  68  69 200 201 202 203]
    

    For large len(array1), using_flatnonzero can be significantly faster than using_loop:

    def using_flatnonzero(array1, array2):
        first, last = array1.min(), array2.max()
        array3 = np.zeros(last-first+1, dtype='i1')
        array3[array1-first] = 1
        array3[array2-first] = -1
        return np.flatnonzero(array3.cumsum())+first
    
    def using_loop(array1, array2):
        return np.concatenate([np.arange(array1[i], array2[i]) for i in
                               np.arange(0,len(array1))])
    
    
    array1, array2 = (np.random.choice(range(1, 11), size=10**4, replace=True)
                      .cumsum().reshape(2, -1, order='F'))
    
    
    assert np.allclose(using_flatnonzero(array1, array2), using_loop(array1, array2))
    

    In [260]: %timeit using_loop(array1, array2)
    100 loops, best of 3: 9.36 ms per loop
    
    In [261]: %timeit using_flatnonzero(array1, array2)
    1000 loops, best of 3: 564 µs per loop
    

    If the ranges overlap, then using_loop will return an array3 which contains duplicates. using_flatnonzero returns an array with no duplicates.


    Explanation: Let's look at a small example with

    array1 = np.array([10, 65, 200]) 
    array2 = np.array([14, 70, 204])
    

    The objective is to build an array which looks like goal, below. The 1's are located at index values [ 10, 11, 12, 13, 65, 66, 67, 68, 69, 200, 201, 202, 203] (i.e. array3):

    In [306]: goal
    Out[306]: 
    array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
           1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int8)
    

    Once we have the goal array, array3 can be obtained with a call to np.flatnonzero:

    In [307]: np.flatnonzero(goal)
    Out[307]: array([ 10,  11,  12,  13,  65,  66,  67,  68,  69, 200, 201, 202, 203])
    

    goal has the same length as array2.max():

    In [308]: array2.max()
    Out[308]: 204
    
    In [309]: goal.shape
    Out[309]: (204,)
    

    So we can begin by allocating

    goal = np.zeros(array2.max()+1, dtype='i1')
    

    and then filling in 1's at the index locations given by array1 and -1's at the indices given by array2:

    In [311]: goal[array1] = 1
    In [312]: goal[array2] = -1
    In [313]: goal
    Out[313]: 
    array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0, -1,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,
            0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
            0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,
           -1], dtype=int8)
    

    Now applying cumsum (the cumulative sum) produces the desired goal array:

    In [314]: goal = goal.cumsum(); goal
    Out[314]: 
    array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
           1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0])
    
    In [315]: np.flatnonzero(goal)
    Out[315]: array([ 10,  11,  12,  13,  65,  66,  67,  68,  69, 200, 201, 202, 203])
    

    That's the main idea behind using_flatnonzero. The subtraction of first was simply to save a bit of memory.

提交回复
热议问题