Splitting a dataframe based on column values

后端 未结 2 1365
难免孤独
难免孤独 2021-01-07 12:01

I have a dataframe like such

 EndDate
2007-10-31              0
2007-11-30    -0.03384464
2007-12-31     -0.0336299
2008-01-31   -0.009448923
2008-02-29              


        
相关标签:
2条回答
  • 2021-01-07 12:31

    Alexander's solution didn't work. There is a small error. The code should be:

    d = {n: df2.iloc[rows] 
     for n, rows in df2.groupby('group_no').groups.items()}
    
    0 讨论(0)
  • 2021-01-07 12:46

    First, you can create group numbers by comparing the value column to zero and then taking a cumulative sum of these boolean values.

    df['group_no'] = (df.val == 0).cumsum()
    >>> df.head(6)
          EndDate       val  group_no
    0  2007-10-31  0.000000         1
    1  2007-11-30 -0.033845         1
    2  2007-12-31 -0.033630         1
    3  2008-01-31 -0.009449         1
    4  2008-02-29  0.000000         2
    5  2008-03-31 -0.057450         2
    

    Next, you can use a dictionary comprehension together with loc to select the relevant group_no dataframe. To get the last group number, I get the last value using iat for location based indexing.

    d = {i: df.loc[df.group_no == i, ['EndDate', 'val']] 
         for i in range(1, df.group_no.iat[-1])}
    
    >>> d
    {1:       EndDate       val
     0  2007-10-31  0.000000
     1  2007-11-30 -0.033845
     2  2007-12-31 -0.033630
     3  2008-01-31 -0.009449, 
     2:       EndDate       val
     4  2008-02-29  0.000000
     5  2008-03-31 -0.057450
     6  2008-04-30 -0.038694, 
     3:       EndDate       val
     7  2008-05-31  0.000000
     8  2008-06-30 -0.036245
     9  2008-07-31 -0.005286}
    

    EDIT As suggested by @DSM, using groupby appears to be about 6x faster based on a sample dataframe with 15k rows.

    d = {n: df2.ix[rows] 
         for n, rows in enumerate(df2.groupby('group_no').groups)}
    
    0 讨论(0)
提交回复
热议问题