Access multiple elements of an array

后端 未结 3 2140
心在旅途
心在旅途 2020-12-06 15:08

Is there a way to get array elements in one operation for known rows and columns of those elements? In each row I would like to access elements from col_start to col_end (ea

3条回答
  •  悲哀的现实
    2020-12-06 15:28

    A = np.arange(40).reshape(4,10)*.1
    startend = [[2,5],[3,6],[4,7],[5,8]]
    index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                     for i,v in enumerate(startend)]
    # [array([2, 3, 4]), array([13, 14, 15]), array([24, 25, 26]), array([35, 36, 37])]
    A.flat[index_list]
    

    producing

    array([[ 0.2,  0.3,  0.4],
           [ 1.3,  1.4,  1.5],
           [ 2.4,  2.5,  2.6],
           [ 3.5,  3.6,  3.7]])
    

    This still has an iteration, but it's a rather basic one over a list. I'm indexing the flattened, 1d, version of A. np.take(A, index_list) also works.

    If the row intervals differ in size, I can use np.r_ to concatenate them. It's not absolutely necessary, but it is a convenience when building up indices from multiple intervals and values.

    A.flat[np.r_[tuple(index_list)]]
    # array([ 0.2,  0.3,  0.4,  1.3,  1.4,  1.5,  2.4,  2.5,  2.6,  3.5,  3.6, 3.7])
    

    The idx that ajcr uses can be used without choose:

    idx = [np.arange(v[0], v[1]) for i,v in enumerate(startend)]
    A[np.arange(A.shape[0])[:,None], idx]
    

    idx is like my index_list except that it doesn't add the row length.

    np.array(idx)
    
    array([[2, 3, 4],
           [3, 4, 5],
           [4, 5, 6],
           [5, 6, 7]])
    

    Since each arange has the same length, idx can be generated without iteration:

    col_start = np.array([2,3,4,5])
    idx = col_start[:,None] + np.arange(3)
    

    The first index is a column array that broadcasts to match this idx.

    np.arange(A.shape[0])[:,None] 
    array([[0],
           [1],
           [2],
           [3]])
    

    With this A and idx I get the following timings:

    In [515]: timeit np.choose(idx,A.T[:,:,None])
    10000 loops, best of 3: 30.8 µs per loop
    
    In [516]: timeit A[np.arange(A.shape[0])[:,None],idx]
    100000 loops, best of 3: 10.8 µs per loop
    
    In [517]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
    10000 loops, best of 3: 24.9 µs per loop
    

    The flat indexing is faster, but calculating the fancier index takes up some time.

    For large arrays, the speed of flat indexing dominates.

    A=np.arange(4000).reshape(40,100)*.1
    col_start=np.arange(20,60)
    idx=col_start[:,None]+np.arange(30)
    
    In [536]: timeit A[np.arange(A.shape[0])[:,None],idx]
    10000 loops, best of 3: 108 µs per loop
    
    In [537]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
    10000 loops, best of 3: 59.4 µs per loop
    

    The np.choose method runs into a hardcoded limit: Need between 2 and (32) array objects (inclusive).


    What out of bounds idx?

    col_start=np.array([2,4,6,8])
    idx=col_start[:,None]+np.arange(3)
    A[np.arange(A.shape[0])[:,None], idx]
    

    produces an error because the last idx value is 10, too large.

    You could clip idx

    idx=idx.clip(0,A.shape[1]-1)
    

    producing duplicate values in the last row

    [ 3.8,  3.9,  3.9]
    

    You could also pad A before indexing. See np.pad for more options.

    np.pad(A,((0,0),(0,2)),'edge')[np.arange(A.shape[0])[:,None], idx]
    

    Another option is to remove out of bounds values. idx would then become a ragged list of lists (or array of lists). The flat approach can handle this, though the result will not be a matrix.

    startend = [[2,5],[4,7],[6,9],[8,10]]
    index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                     for i,v in enumerate(startend)]
    # [array([2, 3, 4]), array([14, 15, 16]), array([26, 27, 28]), array([38, 39])]
    
    A.flat[np.r_[tuple(index_list)]]
    # array([ 0.2,  0.3,  0.4,  1.4,  1.5,  1.6,  2.6,  2.7,  2.8,  3.8,  3.9])
    

提交回复
热议问题