python pandas conditional cumulative sum

前端 未结 3 1099
一个人的身影
一个人的身影 2020-12-15 10:49

Consider my dataframe df

data  data_binary  sum_data
  2       1            1
  5       0            0
  1       1            1
  4       1              


        
3条回答
  •  不思量自难忘°
    2020-12-15 11:07

    you want to take the cumulative sum of data_binary and subtract the most recent cumulative sum where data_binary was zero.

    b = df.data_binary
    c = b.cumsum()
    c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int)
    
    0    1
    1    0
    2    1
    3    2
    4    3
    5    0
    6    0
    7    1
    Name: data_binary, dtype: int64
    

    Explanation

    Let's start by looking at each step side by side

    cols = ['data_binary', 'cumulative_sum', 'nan_non_zero', 'forward_fill', 'final_result']
    print(pd.concat([
            b, c,
            c.mask(b != 0),
            c.mask(b != 0).ffill(),
            c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int)
        ], axis=1, keys=cols))
    
    
       data_binary  cumulative_sum  nan_non_zero  forward_fill  final_result
    0            1               1           NaN           NaN             1
    1            0               1           1.0           1.0             0
    2            1               2           NaN           1.0             1
    3            1               3           NaN           1.0             2
    4            1               4           NaN           1.0             3
    5            0               4           4.0           4.0             0
    6            0               4           4.0           4.0             0
    7            1               5           NaN           4.0             1
    

    The problem with cumulative_sum is that the rows where data_binary is zero, do not reset the sum. And that is the motivation for this solution. How do we "reset" the sum when data_binary is zero? Easy! I slice the cumulative sum where data_binary is zero and forward fill the values. When I take the difference between this and the cumulative sum, I've effectively reset the sum.

提交回复
热议问题