Calculate time difference between Pandas Dataframe indices

前端 未结 3 1729
执念已碎
执念已碎 2020-11-27 11:46

I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).

time             


        
3条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-11-27 12:17

    We can create a series with both index and values equal to the index keys using to_series and then compute the differences between successive rows which would result in timedelta64[ns] dtype. After obtaining this, via the .dt property, we could access the seconds attribute of the time portion and finally divide each element by 60 to get it outputted in minutes(optionally filling the first value with 0).

    In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0)
        ...: df                                 # use .astype(int) to obtain integer values
    Out[13]: 
                         value  deltaT
    time                              
    2012-03-16 23:50:00      1     0.0
    2012-03-16 23:56:00      2     6.0
    2012-03-17 00:08:00      3    12.0
    2012-03-17 00:10:00      4     2.0
    2012-03-17 00:12:00      5     2.0
    2012-03-17 00:20:00      6     8.0
    2012-03-20 00:43:00      7    23.0
    

    simplification:

    When we perform diff:

    In [8]: ser_diff = df.index.to_series().diff()
    
    In [9]: ser_diff
    Out[9]: 
    time
    2012-03-16 23:50:00               NaT
    2012-03-16 23:56:00   0 days 00:06:00
    2012-03-17 00:08:00   0 days 00:12:00
    2012-03-17 00:10:00   0 days 00:02:00
    2012-03-17 00:12:00   0 days 00:02:00
    2012-03-17 00:20:00   0 days 00:08:00
    2012-03-20 00:43:00   3 days 00:23:00
    Name: time, dtype: timedelta64[ns]
    

    Seconds to minutes conversion:

    In [10]: ser_diff.dt.seconds.div(60, fill_value=0)
    Out[10]: 
    time
    2012-03-16 23:50:00     0.0
    2012-03-16 23:56:00     6.0
    2012-03-17 00:08:00    12.0
    2012-03-17 00:10:00     2.0
    2012-03-17 00:12:00     2.0
    2012-03-17 00:20:00     8.0
    2012-03-20 00:43:00    23.0
    Name: time, dtype: float64
    

    If suppose you want to include even the date portion as it was excluded previously(only time portion was considered), dt.total_seconds would give you the elapsed duration in seconds with which minutes could then be calculated again by division.

    In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0)
    Out[12]: 
    time
    2012-03-16 23:50:00       0.0
    2012-03-16 23:56:00       6.0
    2012-03-17 00:08:00      12.0
    2012-03-17 00:10:00       2.0
    2012-03-17 00:12:00       2.0
    2012-03-17 00:20:00       8.0
    2012-03-20 00:43:00    4343.0    # <-- number of minutes in 3 days 23 minutes
    Name: time, dtype: float64
    

提交回复
热议问题