Using cumsum in pandas on group()

后端 未结 2 1756
难免孤独
难免孤独 2020-12-28 21:33

From a Pandas newbie: I have data that looks essentially like this -

 data1=pd.DataFrame({\'Dir\':[\'E\',\'E\',\'W\',\'W\',\'E\',\'W\',\'W\',\'E\'], \'Bool\'         


        
2条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-28 21:55

    As the other answer points out, you're trying to collapse identical dates into single rows, whereas the cumsum function will return a series of the same length as the original DataFrame. Stated differently, you actually want to group by [Bool, Dir, Date], calculate a sum in each group, THEN return a cumsum on rows grouped by [Bool, Dir]. The other answer is a perfectly valid solution to your specific question, here's a one-liner variation:

    data1.groupby(['Bool', 'Dir', 'Date']).sum().groupby(level=[0, 1]).cumsum()
    

    This returns output exactly in the requested format.

    For those looking for a simple cumsum on a Pandas group, you can use:

    data1.groupby(['Bool', 'Dir']).apply(lambda x: x['Data'].cumsum())
    

    The cumulative sum is calculated internal to each group. Here's what the output looks like:

    Bool  Dir            
    N     E    2000-12-30     5
               2000-12-30    16
          W    2001-01-02     7
               2001-01-03    16
    Y     E    2000-12-30     4
               2001-01-03    12
          W    2000-12-30     6
               2000-12-30    16
    Name: Data, dtype: int64
    

    Note the repeated dates, but this is doing a strict cumulative sum internal to the rows of each group identified by the Bool and Dir columns.

提交回复
热议问题