Build numpy array with multiple custom index ranges without explicit loop

前端未结

关注

 4  604

北海茫月 2020-12-05 21:45

In Numpy, is there a pythonic way to create array3 with custom ranges from array1 and array2 without a loop? The straightforward solution of iterating over the ranges works

4条回答

陌清茗 (楼主)

2020-12-05 22:23

Assuming the ranges do not overlap, you could build a mask which is nonzero where the index is between the ranges specified by array1 and array2 and then use np.flatnonzero to obtain an array of indices -- the desired array3:

import numpy as np

array1 = np.array([10, 65, 200]) 
array2 = np.array([14, 70, 204])

first, last = array1.min(), array2.max()
array3 = np.zeros(last-first+1, dtype='i1')
array3[array1-first] = 1
array3[array2-first] = -1
array3 = np.flatnonzero(array3.cumsum())+first
print(array3)

yields

[ 10  11  12  13  65  66  67  68  69 200 201 202 203]

For large len(array1), using_flatnonzero can be significantly faster than using_loop:

def using_flatnonzero(array1, array2):
    first, last = array1.min(), array2.max()
    array3 = np.zeros(last-first+1, dtype='i1')
    array3[array1-first] = 1
    array3[array2-first] = -1
    return np.flatnonzero(array3.cumsum())+first

def using_loop(array1, array2):
    return np.concatenate([np.arange(array1[i], array2[i]) for i in
                           np.arange(0,len(array1))])


array1, array2 = (np.random.choice(range(1, 11), size=10**4, replace=True)
                  .cumsum().reshape(2, -1, order='F'))


assert np.allclose(using_flatnonzero(array1, array2), using_loop(array1, array2))

In [260]: %timeit using_loop(array1, array2)
100 loops, best of 3: 9.36 ms per loop

In [261]: %timeit using_flatnonzero(array1, array2)
1000 loops, best of 3: 564 µs per loop

If the ranges overlap, then using_loop will return an array3 which contains duplicates. using_flatnonzero returns an array with no duplicates.

Explanation: Let's look at a small example with

array1 = np.array([10, 65, 200]) 
array2 = np.array([14, 70, 204])

The objective is to build an array which looks like goal, below. The 1's are located at index values [ 10, 11, 12, 13, 65, 66, 67, 68, 69, 200, 201, 202, 203] (i.e. array3):

In [306]: goal
Out[306]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int8)

Once we have the goal array, array3 can be obtained with a call to np.flatnonzero:

In [307]: np.flatnonzero(goal)
Out[307]: array([ 10,  11,  12,  13,  65,  66,  67,  68,  69, 200, 201, 202, 203])

goal has the same length as array2.max():

In [308]: array2.max()
Out[308]: 204

In [309]: goal.shape
Out[309]: (204,)

So we can begin by allocating

goal = np.zeros(array2.max()+1, dtype='i1')

and then filling in 1's at the index locations given by array1 and -1's at the indices given by array2:

In [311]: goal[array1] = 1
In [312]: goal[array2] = -1
In [313]: goal
Out[313]: 
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0, -1,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,
        0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,
       -1], dtype=int8)

Now applying cumsum (the cumulative sum) produces the desired goal array:

In [314]: goal = goal.cumsum(); goal
Out[314]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0])

In [315]: np.flatnonzero(goal)
Out[315]: array([ 10,  11,  12,  13,  65,  66,  67,  68,  69, 200, 201, 202, 203])

That's the main idea behind using_flatnonzero. The subtraction of first was simply to save a bit of memory.

0 讨论(0)

查看其它4个回答