Is there a way to get array elements in one operation for known rows and columns of those elements? In each row I would like to access elements from col_start to col_end (ea
A = np.arange(40).reshape(4,10)*.1
startend = [[2,5],[3,6],[4,7],[5,8]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1]
for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([13, 14, 15]), array([24, 25, 26]), array([35, 36, 37])]
A.flat[index_list]
producing
array([[ 0.2, 0.3, 0.4],
[ 1.3, 1.4, 1.5],
[ 2.4, 2.5, 2.6],
[ 3.5, 3.6, 3.7]])
This still has an iteration, but it's a rather basic one over a list.
I'm indexing the flattened, 1d, version of A
. np.take(A, index_list)
also works.
If the row intervals differ in size, I can use np.r_
to concatenate them. It's not absolutely necessary, but it is a convenience when building up indices from multiple intervals and values.
A.flat[np.r_[tuple(index_list)]]
# array([ 0.2, 0.3, 0.4, 1.3, 1.4, 1.5, 2.4, 2.5, 2.6, 3.5, 3.6, 3.7])
The idx
that ajcr
uses can be used without choose
:
idx = [np.arange(v[0], v[1]) for i,v in enumerate(startend)]
A[np.arange(A.shape[0])[:,None], idx]
idx
is like my index_list
except that it doesn't add the row length.
np.array(idx)
array([[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7]])
Since each arange
has the same length, idx
can be generated without iteration:
col_start = np.array([2,3,4,5])
idx = col_start[:,None] + np.arange(3)
The first index is a column array that broadcasts to match this idx
.
np.arange(A.shape[0])[:,None]
array([[0],
[1],
[2],
[3]])
With this A
and idx
I get the following timings:
In [515]: timeit np.choose(idx,A.T[:,:,None])
10000 loops, best of 3: 30.8 µs per loop
In [516]: timeit A[np.arange(A.shape[0])[:,None],idx]
100000 loops, best of 3: 10.8 µs per loop
In [517]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 24.9 µs per loop
The flat
indexing is faster, but calculating the fancier index takes up some time.
For large arrays, the speed of flat
indexing dominates.
A=np.arange(4000).reshape(40,100)*.1
col_start=np.arange(20,60)
idx=col_start[:,None]+np.arange(30)
In [536]: timeit A[np.arange(A.shape[0])[:,None],idx]
10000 loops, best of 3: 108 µs per loop
In [537]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 59.4 µs per loop
The np.choose
method runs into a hardcoded limit: Need between 2 and (32) array objects (inclusive).
What out of bounds idx
?
col_start=np.array([2,4,6,8])
idx=col_start[:,None]+np.arange(3)
A[np.arange(A.shape[0])[:,None], idx]
produces an error because the last idx
value is 10
, too large.
You could clip
idx
idx=idx.clip(0,A.shape[1]-1)
producing duplicate values in the last row
[ 3.8, 3.9, 3.9]
You could also pad A
before indexing. See np.pad
for more options.
np.pad(A,((0,0),(0,2)),'edge')[np.arange(A.shape[0])[:,None], idx]
Another option is to remove out of bounds values. idx
would then become a ragged list of lists (or array of lists). The flat
approach can handle this, though the result will not be a matrix.
startend = [[2,5],[4,7],[6,9],[8,10]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1]
for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([14, 15, 16]), array([26, 27, 28]), array([38, 39])]
A.flat[np.r_[tuple(index_list)]]
# array([ 0.2, 0.3, 0.4, 1.4, 1.5, 1.6, 2.6, 2.7, 2.8, 3.8, 3.9])
I think you're looking for something like the below. I'm not sure what you want to do with them when you access them though.
indexes = [(4,6), (0,2), (2,4), (8, 10)]
arr = [
[ . . . . | | | . . . . . ],
[ | | | . . . . . . . . . ],
[ . . | | | . . . . . . . ],
[ . . . . . . . . | | | . ]
]
for x in zip(indexes, arr):
index = x[0]
row = x[1]
print row[index[0]:index[1]+1]
You can use np.choose.
Here's an example NumPy array arr
:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20]])
Let's say we want to pick the values [1, 2, 3]
from the first row, [11, 12, 13]
from the second row and [17, 18, 19]
from the third row.
In other words, we'll pick out the indices from each row of arr
as shown in an array idx
:
array([[1, 2, 3],
[4, 5, 6],
[3, 4, 5]])
Then using np.choose
:
>>> np.choose(idx, arr.T[:,:,np.newaxis])
array([[ 1, 2, 3],
[11, 12, 13],
[17, 18, 19]])
To explain what just happened: arr.T[:,:,np.newaxis]
meant that arr
was temporarily viewed as 3D array with shape (7, 3, 1)
. You can imagine this as 3D array where each column of the original arr
is now a 2D column vector with three values. The 3D array looks a bit like this:
# 0 1 2 3 4 5 6
[[ 0] [[ 1] [[ 2] [[ 3] [[ 4] [[ 5] [[ 6] # choose values from 1, 2, 3
[ 7] [ 8] [ 9] [10] [11] [12] [13] # choose values from 4, 5, 6
[14]] [15]] [16]] [17]] [18]] [19]] [20]] # choose values from 3, 4, 5
To get the zeroth row of the output array, choose
selects the zeroth element from the 2D column at index 1
, the zeroth element from the 2D column at index 2
, and the zeroth element from the 2D column at index 3
.
To get the first row of the output array, choose
selects the first element from the 2D column at index 4
, the first element from the 2D column at index 5
, ... and so on.