Avoiding np.where when assigning to numpy array

前端 未结 2 1517
萌比男神i
萌比男神i 2020-12-10 21:18

I would like for the following (or similar) to work (without using np.where)

>>> A = np.arange(0,10)
>>> ind = np.logical_and(         


        
2条回答
  •  猫巷女王i
    2020-12-10 22:04

    From the oft-referenced numpy indexing page:

    .... A single boolean index array is practically identical to x[obj.nonzero()] .... However, it is faster when obj.shape == x.shape.

    np.where(cond) is np.nonzero(cond).

    But let's do some simple timing

    In [239]: x = np.arange(10000)
    In [240]: y = (x%2).astype(bool)
    In [241]: x[y].shape
    Out[241]: (5000,)
    In [242]: idx = np.nonzero(y)
    In [243]: x[idx].shape
    Out[243]: (5000,)
    In [244]: timeit x[y].shape
    89.9 µs ± 726 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [245]: timeit x[idx].shape
    13.3 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    In [246]: timeit x[np.nonzero(y)].shape
    34.2 µs ± 893 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    So array indexing is faster than boolean indexing, even when we use an explicit where.


    A[ind][k]= does not work because A[ind] is a copy, not a view.

    In [251]: A = np.arange(100,110)
    In [252]: ind = np.logical_and(A>104, A%2)
    In [253]: ind
    Out[253]: 
    array([False, False, False, False, False,  True, False,  True, False,
            True])
    In [254]: k = np.array([0,1,0], dtype=bool)
    In [255]: A[ind]
    Out[255]: array([105, 107, 109])
    In [256]: A[ind][k]
    Out[256]: array([107])
    In [257]: A[ind][k] = 12
    In [258]: A
    Out[258]: array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])
    

    But using the k to select indices from np.where(ind) works:

    In [262]: A[np.where(ind)[0][k]]=12
    In [263]: A
    Out[263]: array([100, 101, 102, 103, 104, 105, 106,  12, 108, 109])
    

    Timings for a fetch rather than a set:

    In [264]: timeit A[np.where(ind)[0][k]]
    1.94 µs ± 75.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    In [265]: timeit A[ind][k]
    1.34 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    

    So yes, the double masking is a bit faster in this case, but that doesn't matter if it doesn't work. Don't sweat the small time improvements.

    A boolean indexing method

    In [345]: ind1=ind.copy()
    In [346]: ind1[ind] = k
    In [348]: A[ind1]=3
    In [349]: A
    Out[349]: array([100, 101, 102, 103, 104, 105, 106,   3, 108, 109])
    

    In this small example timeit is basically the same as for A[np.where(ind)[0][k]]=12.

提交回复
热议问题