How to count distance to the previous zero in pandas series?

问题

I have the following pandas series (represented as a list):

[7,2,0,3,4,2,5,0,3,4]

I would like to define a new series that returns distance to the last zero. It means that I would like to have the following output:

[1,2,0,1,2,3,4,0,1,2]

How to do it in pandas in the most efficient way?

回答1:

The complexity is O(n). What will slow it down is doing a for loop in python. If there are k zeros in the series, and log k is negligibile comparing to the length of series, an O(n log k) solution would be:

>>> izero = np.r_[-1, (ts == 0).nonzero()[0]]  # indices of zeros
>>> idx = np.arange(len(ts))
>>> idx - izero[np.searchsorted(izero - 1, idx) - 1]
array([1, 2, 0, 1, 2, 3, 4, 0, 1, 2])

回答2:

A solution in Pandas is a little bit tricky, but could look like this (s is your Series):

>>> x = (s != 0).cumsum()
>>> y = x != x.shift()
>>> y.groupby((y != y.shift()).cumsum()).cumsum()
0    1
1    2
2    0
3    1
4    2
5    3
6    4
7    0
8    1
9    2
dtype: int64

For the last step, this uses the "itertools.groupby" recipe in the Pandas cookbook here.

回答3:

It's sometimes surprising to see how simple it is to get c-like speeds for this stuff using Cython. Assuming your column's .values gives arr, then:

cdef int[:, :, :] arr_view = arr
ret = np.zeros_like(arr)
cdef int[:, :, :] ret_view = ret

cdef int i, zero_count = 0
for i in range(len(ret)):
    zero_count = 0 if arr_view[i] == 0 else zero_count + 1
    ret_view[i] = zero_count

Note the use of typed memory views, which are extremely fast. You can speed it further using @cython.boundscheck(False) decorating a function using this.

来源：https://stackoverflow.com/questions/30730981/how-to-count-distance-to-the-previous-zero-in-pandas-series

标签

python

numpy

pandas

series