Numpy View Reshape Without Copy (2d Moving/Sliding Window, Strides, Masked Memory Structures)

那年仲夏 提交于 2019-12-05 07:09:24

Your task isn't possible using only strides, but NumPy does support one kind of array that does the job. With strides and masked_array you can create the desired view to your data. However, not all NumPy functions support operations with masked_array, so it is possible the scikit-learn doesn't do well with these either.

Let's first take a fresh look at what we are trying to do here. Consider the input data of your example. Fundamentally the data is just a 1-d array in the memory, and it is simpler if we think about the strides with that. The array only appears to be 2-d, because we have defined its shape. Using strides, the shape could be defined like this:

from numpy.lib.stride_tricks import as_strided

base = np.arange(9)
isize = base.itemsize
A = as_strided(base, shape=(3, 3), strides=(3 * isize, isize))

Now the goal is to set such strides to base that it orders the numbers like in the end array, B. In other words, we are asking for integers a and b such that

>>> as_strided(base, shape=(4, 4), strides=(a, b))
array([[0, 1, 3, 4],
       [1, 2, 4, 5],
       [3, 4, 6, 7],
       [4, 5, 7, 8]])

But this is clearly impossible. The closest view we can achieve like this is with a rolling window over base:

>>> C = as_strided(base, shape=(5, 5), strides=(isize, isize))
>>> C
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])

But the difference here is that we have extra columns and rows, which we would like to get rid of. So, effectively we are asking for a rolling window which is not contiguous and also makes jumps at regular intervals. With this example we want to have every third item excluded from the window and jump over one item after two rows.

We can describe this as a masked_array:

>>> mask = np.zeros((5, 5), dtype=bool)
>>> mask[2, :] = True
>>> mask[:, 2] = True
>>> D = np.ma.masked_array(C, mask=mask)

This array contains exactly the data that we want, and it is only a view to the original data. We can confirm that the data is equal

>>> D.data[~D.mask].reshape(4, 4)
array([[0, 1, 3, 4],
       [1, 2, 4, 5],
       [3, 4, 6, 7],
       [4, 5, 7, 8]])

But as I said in the beginning, it is quite likely that scikit-learn doesn't understand masked arrays. If it simply converts this to an array, the data will be wrong:

>>> np.array(D)
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!