I have a numpy array consisting of 0\'s and 1\'s. Each sequence of 1\'s within the array stands for occurrence of one event. I want to
You want to label and luckily, there's one with SciPy, scipy.ndimage.label -
In [43]: from scipy.ndimage import label
In [47]: out = label(arr)[0]
In [48]: np.where(arr==0,np.nan,out-1)
Out[48]:
array([nan, nan, nan, 0., 0., 0., nan, nan, nan, 1., 1., nan, nan,
nan, 2., 2., 2., 2.])
Another with some NumPy work -
def rank_chunks(arr):
m = np.r_[False,arr.astype(bool)]
idx = np.flatnonzero(m[:-1] < m[1:])
id_ar = np.zeros(len(arr),dtype=float)
id_ar[idx[1:]] = 1
out = id_ar.cumsum()
out[arr==0] = np.nan
return out
Another with masking + np.repeat -
def rank_chunks_v2(arr):
m = np.r_[False,arr.astype(bool),False]
idx = np.flatnonzero(m[:-1] != m[1:])
l = idx[1::2]-idx[::2]
out = np.full(len(arr),np.nan,dtype=float)
out[arr!=0] = np.repeat(np.arange(len(l)),l)
return out
Timings (tiling given input to 1Mx) -
In [153]: arr_big = np.tile(arr,1000000)
In [154]: %timeit np.where(arr_big==0,np.nan,label(arr_big)[0]-1)
...: %timeit rank_chunks(arr_big)
...: %timeit rank_chunks_v2(arr_big)
1 loop, best of 3: 312 ms per loop
1 loop, best of 3: 263 ms per loop
1 loop, best of 3: 229 ms per loop