pick TxK numpy array from TxN numpy array using TxK column index array

為{幸葍}努か 提交于 2020-01-05 13:10:46

问题


This is an indirect indexing problem.

It can be solved with a list comprehension.

The question is whether, or, how to solve it within numpy,

When data.shape is (T,N) and c.shape is (T,K)

and each element of c is an int between 0 and N-1 inclusive, that is, each element of c is intended to refer to a column number from data.

The goal is to obtain out where

out.shape = (T,K)

And for each i in 0..(T-1)

the row out[i] = [ data[i, c[i,0]] , ... , data[i, c[i,K-1]] ]

Concrete example:

data = np.array([\
       [ 0,  1,  2],\
       [ 3,  4,  5],\
       [ 6,  7,  8],\
       [ 9, 10, 11],\
       [12, 13, 14]])

c = np.array([
      [0, 2],\
      [1, 2],\
      [0, 0],\       
      [1, 1],\       
      [2, 2]])

out should be out = [[0, 2], [4, 5], [6, 6], [10, 10], [14, 14]]

The first row of out is [0,2] because the columns chosen are given by c's row 0, they are 0 and 2, and data[0] at columns 0 and 2 are 0 and 2.

The second row of out is [4,5] because the columns chosen are given by c's row 1, they are 1 and 2, and data[1] at columns 1 and 2 is 4 and 5.

Numpy fancy indexing doesn't seem to solve this in an obvious way because indexing data with c (e.g. data[c], np.take(data,c,axis=1) ) always produces a 3 dimensional array.

A list comprehension can solve it:

out = [ [data[rowidx,i1],data[rowidx,i2]] for (rowidx, (i1,i2)) in enumerate(c) ]

if K is 2 I suppose this is marginally OK. If K is variable, this is not so good.

The list comprehension has to be rewritten for each value K, because it unrolls the columns picked out of data by each row of c. It also violates DRY.

Is there a solution based entirely in numpy?


回答1:


You can avoid loops with np.choose:

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.

data = np.array([\
       [ 0,  1,  2],\
       [ 3,  4,  5],\
       [ 6,  7,  8],\
       [ 9, 10, 11],\
       [12, 13, 14]])

c = np.array([
      [0, 2],\
      [1, 2],\
      [0, 0],\
      [1, 1],\
      [2, 2]])
--

In [2]: np.choose(c, data.T[:,:,np.newaxis])
Out[2]: 
array([[ 0,  2],
       [ 4,  5],
       [ 6,  6],
       [10, 10],
       [14, 14]])



回答2:


Here's one possible route to a general solution...

Create masks for data to select the values for each column of out. For example, the first mask could be achieved by writing:

>>> np.arange(3) == np.vstack(c[:,0])
array([[ True, False, False],
       [False,  True, False],
       [ True, False, False],
       [False,  True, False],
       [False, False,  True]], dtype=bool)

>>> data[_]
array([ 2,  5,  6, 10, 14])

The mask to get the values for the second column of out: np.arange(3) == np.vstack(c[:,1]).

So, to get the out array...

>>> mask0 = np.arange(3) == np.vstack(c[:,0])
>>> mask1 = np.arange(3) == np.vstack(c[:,1])
>>> np.vstack((data[mask0], data[mask1])).T
array([[ 0,  2],
       [ 4,  5],
       [ 6,  6],
       [10, 10],
       [14, 14]])

Edit: Given arbitrary array widths K and N you could use a loop to create the masks, so the general construction of the out array might simply look like this:

np.vstack([data[np.arange(N) == np.vstack(c[:,i])] for i in range(K)]).T

Edit 2: A slightly neater solution (though still relying on a loop) is:

np.vstack([data[i][c[i]] for i in range(T)])


来源:https://stackoverflow.com/questions/26222835/pick-txk-numpy-array-from-txn-numpy-array-using-txk-column-index-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!