Build numpy array with multiple custom index ranges without explicit loop

前端 未结 4 589
北海茫月
北海茫月 2020-12-05 21:45

In Numpy, is there a pythonic way to create array3 with custom ranges from array1 and array2 without a loop? The straightforward solution of iterating over the ranges works

4条回答
  •  离开以前
    2020-12-05 22:43

    Do you mean this?

    In [440]: np.r_[10:14,65:70,200:204]
    Out[440]: array([ 10,  11,  12,  13,  65,  66,  67,  68,  69, 200, 201, 202, 203])
    

    or generalizing:

    In [454]: np.r_[tuple([slice(i,j) for i,j in zip(array1,array2)])]
    Out[454]: array([ 10,  11,  12,  13,  65,  66,  67,  68,  69, 200, 201, 202, 203])
    

    Though this does involve a double loop, the explicit one to generate the slices and one inside r_ to convert the slices to arange.

        for k in range(len(key)):
            scalar = False
            if isinstance(key[k], slice):
                step = key[k].step
                start = key[k].start
                    ...
                    newobj = _nx.arange(start, stop, step)
    

    I mention this because it shows that numpy developers consider your kind of iteration normal.

    I expect that @unutbu's cleaver, if somewhat obtuse (I haven't figured out what it is doing yet), solution is your best chance of speed. cumsum is a good tool when you need to work with ranges than can vary in length. It probably gains most when the working with many small ranges. I don't think it works with overlapping ranges.

    ================

    np.vectorize uses np.frompyfunc. So this iteration can also be expressed with:

    In [467]: f=np.frompyfunc(lambda x,y: np.arange(x,y), 2,1)
    
    In [468]: f(array1,array2)
    Out[468]: 
    array([array([10, 11, 12, 13]), array([65, 66, 67, 68, 69]),
           array([200, 201, 202, 203])], dtype=object)
    
    In [469]: timeit np.concatenate(f(array1,array2))
    100000 loops, best of 3: 17 µs per loop
    
    In [470]: timeit np.r_[tuple([slice(i,j) for i,j in zip(array1,array2)])]
    10000 loops, best of 3: 65.7 µs per loop
    

    With @Darius's vectorize solution:

    In [474]: timeit result = np.concatenate(ranges(array1, array2), axis=0)
    10000 loops, best of 3: 52 µs per loop
    

    vectorize must be doing some extra work to allow more powerful use of broadcasting. Relative speeds may shift if array1 is much larger.

    @unutbu's solution isn't special with this small array1.

    In [478]: timeit using_flatnonzero(array1,array2)
    10000 loops, best of 3: 57.3 µs per loop
    

    The OP solution, iterative without my r_ middle man is good

    In [483]: timeit array3 = np.concatenate([np.arange(array1[i], array2[i]) for i in np.arange(0,len(array1))])
    10000 loops, best of 3: 24.8 µs per loop
    

    It's often the case that with a small number of loops, a list comprehension is faster than fancier numpy operations.

    For @unutbu's larger test case, my timings are consistent with his - with a 17x speed up.

    ===================

    For the small sample arrays, @Divakar's solution is slower, but for the large ones 3x faster than @unutbu's. So it has more of a setup cost, but scales slower.

提交回复
热议问题