Numpy array segmentation

断了今生、忘了曾经 提交于 2021-01-29 18:35:55

问题


I have a numpy array

import numpy as np

arr = np.arange(20).reshape(2,10)
arr[1,:] = 0
arr[1,2] = arr[1,5] = arr[1,7] = 1
print(arr)
>>>[[0 1 2 3 4 5 6 7 8 9]
>>> [0 0 1 0 0 1 0 1 0 0]]

I want to extract overlapping arrays, starting at a 1 and ending behind the next 1. Expected output:

[[0 1 2 3]
 [0 0 1 0]]

[[2 3 4 5 6]
 [1 0 0 1 0]]

[[5 6 7 8]
 [1 0 1 0]]

[[7 8 9]
 [1 0 0]]

At the moment, I have an index-based for-loop that feels awkward in a numpy context and also has to treat the first and last segment as special cases:

arr[1,0] = 1
ind = list(np.where(arr[1,:]))[0]
print(ind)

for i, j in enumerate(ind):
    if not i:
        continue
    curr = np.copy(arr[:, ind[i-1]:j+2])
    print(curr) 
        
#last segment
curr = np.copy(arr[:, j:])
print(curr)

This approach gives me the desired output but I cannot believe there is not a numpier way to achieve this (although the tumbleweed reaction here may indicate this). If there is an easier pandas solution, that would also be fine. The output is ideally a list of these arrays or a similar data structure; the output arrays don't have to be returned individually.


回答1:


There is a part of solution, my favorite and not complicated:

split_idx = np.flatnonzero(arr[1]) + 2
>>> np.split(arr, split_idx, axis=1)
[array([[0, 1, 2, 3],
        [0, 0, 1, 0]]),
 array([[4, 5, 6],
        [0, 1, 0]]),
 array([[7, 8],
        [1, 0]]),
 array([[9],
        [0]])]

But there are two things that indicates a bad design of any numpyic approach for this problem:

  • You're forced to work with lists of distinct shapes which is not designed for numpy. So np.split is quite slow.
  • You can't loop an array in one go. Extra insertions are needed at the beginnings of interior items.


来源:https://stackoverflow.com/questions/64784980/numpy-array-segmentation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!