Finding the mean and standard deviation of a timedelta object in pandas df

前端 未结 4 1540
太阳男子
太阳男子 2020-12-16 11:29

I would like to calculate the mean and standard deviation of a timedelta by bank from a dataframe with two columns shown

相关标签:
4条回答
  • 2020-12-16 12:05

    No need to convert timedelta back and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your dropped DataFrame:

    import numpy as np
    
    grouped = dropped.groupby('bank')['diff']
    
    mean = grouped.apply(lambda x: np.mean(x))
    std = grouped.apply(lambda x: np.std(x))
    
    0 讨论(0)
  • 2020-12-16 12:14

    I would suggest passing the numeric_only=False argument to mean as mentioned by Alexander Usikov - this works for pandas version 0.20+.

    If you have an older version, the following works:

    import pandas pd
    
    df = pd.DataFrame({
        'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
        'group': ['a', 'a', 'a', 'b', 'b']
    })
    
    (
        df
        .astype({'td': int})         # convert timedelta to integer (nanoseconds)
        .groupby('group')
        .mean()
        .astype({'td': 'timedelta64[ns]'})
    )
    
    0 讨论(0)
  • 2020-12-16 12:21

    Pandas mean() and other aggregation methods support numeric_only=False parameter.

    dropped.groupby('bank').mean(numeric_only=False)
    

    Found here: Aggregations for Timedelta values in the Python DataFrame

    0 讨论(0)
  • 2020-12-16 12:28

    You need to convert timedelta to some numeric value, e.g. int64 by values what is most accurate, because convert to ns is what is the numeric representation of timedelta:

    dropped['new'] = dropped['diff'].values.astype(np.int64)
    
    means = dropped.groupby('bank').mean()
    means['new'] = pd.to_timedelta(means['new'])
    
    std = dropped.groupby('bank').std()
    std['new'] = pd.to_timedelta(std['new'])
    

    Another solution is to convert values to seconds by total_seconds, but that is less accurate:

    dropped['new'] = dropped['diff'].dt.total_seconds()
    
    means = dropped.groupby('bank').mean()
    
    0 讨论(0)
提交回复
热议问题