Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes

后端 未结 3 1565
甜味超标
甜味超标 2020-11-22 12:14

I have two columns, fromdate and todate, in a dataframe.

import pandas as pd

data = {\'todat         


        
相关标签:
3条回答
  • 2020-11-22 12:41

    This was driving me bonkers as the .astype() solution above didn't work for me. But I found another way. Haven't timed it or anything, but might work for others out there:

    t1 = pd.to_datetime('1/1/2015 01:00')
    t2 = pd.to_datetime('1/1/2015 03:30')
    
    print pd.Timedelta(t2 - t1).seconds / 3600.0
    

    ...if you want hours. Or:

    print pd.Timedelta(t2 - t1).seconds / 60.0
    

    ...if you want minutes.

    0 讨论(0)
  • 2020-11-22 12:55
    • How do I convert my results to only hours and minutes
      • The accepted answer only returns days + hours. Minutes are not included.
    • To provide a column that has hours and minutes, as hh:mm or x hours y minutes, would require additional calculations and string formatting.
    • This answer shows how to get either total hours or total minutes as a float, using timedelta math, and is faster than using .astype('timedelta64[h]')
    • Pandas Time Deltas User Guide
    • Pandas Time series / date functionality User Guide
    • python timedelta objects: See supported operations.
    import pandas as pd
    
    # test data from OP, with values already in a datetime format
    data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
            'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
    
    # test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
    df = pd.DataFrame(data)
    
    # add a timedelta column if wanted. It's added here for information only
    # df['time_delta_with_sub'] = df.from_date.sub(df.to_date)  # also works
    df['time_delta'] = (df.from_date - df.to_date)
    
    # create a column with timedelta as total hours, as a float type
    df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)
    
    # create a colume with timedelta as total minutes, as a float type
    df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)
    
    # display(df)
                      to_date               from_date             time_delta  tot_hour_diff  tot_mins_diff
    0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000      58.636061    3518.163667
    1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000       3.684528     221.071667
    2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000       8.714933     522.896000
    

    Other methods

    • An item of note from the podcast in Other Resources, .total_seconds() was added and merged when the core developer was on vacation, and would not have been approved.
      • This is also why there aren't other .total_xx methods.
    # convert the entire timedelta to seconds
    # this is the same as td / timedelta(seconds=1)
    (df.from_date - df.to_date).dt.total_seconds()
    [out]:
    0    211089.82
    1     13264.30
    2     31373.76
    dtype: float64
    
    # get the number of days
    (df.from_date - df.to_date).dt.days
    [out]:
    0    2
    1    0
    2    0
    dtype: int64
    
    # get the seconds for hours + minutes + seconds, but not days
    # note the difference from total_seconds
    (df.from_date - df.to_date).dt.seconds
    [out]:
    0    38289
    1    13264
    2    31373
    dtype: int64
    

    Other Resources

    • Talk Python to Me: Episode #271: Unlock the mysteries of time, Python's datetime that is!
      • Timedelta begins at 31 minutes
      • As per Python core developer Paul Ganssle and python dateutil maintainer:
        • Use (df.from_date - df.to_date) / pd.Timedelta(hours=1)
        • Don't use (df.from_date - df.to_date).dt.total_seconds() / 3600
          • pandas.Series.dt.total_seconds
          • .dt accessor
    • Real Python: Using Python datetime to Work With Dates and Times
    • The dateutil module provides powerful extensions to the standard datetime module.

    %%timeit test

    import pandas as pd
    
    # dataframe with 2M rows
    data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
    df = pd.DataFrame(data)
    df = pd.concat([df] * 1000000).reset_index(drop=True)
    
    %%timeit
    (df.from_date - df.to_date) / pd.Timedelta(hours=1)
    [out]:
    43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    %%timeit
    (df.from_date - df.to_date).astype('timedelta64[h]')
    [out]:
    59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    0 讨论(0)
  • 2020-11-22 13:01

    Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so

    import pandas
    df = pandas.DataFrame(columns=['to','fr','ans'])
    df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
    df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
    (df.fr-df.to).astype('timedelta64[h]')
    

    to yield,

    0    58
    1     3
    2     8
    dtype: float64
    
    0 讨论(0)
提交回复
热议问题