Datetime objects with pandas mean function

前端 未结 3 1511
臣服心动
臣服心动 2020-12-11 04:00

I am new to programming so I apologize in advance if this question does not make any sens. I noticed that when I try to calculate the mean value of a pandas data frame with

相关标签:
3条回答
  • 2020-12-11 04:32

    To simplify Alex's answer (I would have added this as a comment but I don't have sufficient reputation):

    import datetime
    import pandas as pd
    
    d={'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
       'two': pd.Series([datetime.datetime(2014, 7, 9), 
               datetime.datetime(2014, 7, 10), 
               datetime.datetime(2014, 7, 11) ], 
               index=['a', 'b', 'c'])}
    df = pd.DataFrame(d)
    

    Which looks like:

       one   two
    a   1   2014-07-09
    b   2   2014-07-10
    c   3   2014-07-11
    

    Then calculate the mean of column "two" by:

    (df.two - df.two.min()).mean() + df.two.min()
    

    So, subtract the min of the timeseries, calculate the mean (or median) of the resulting timedeltas, and add back the min.

    0 讨论(0)
  • 2020-12-11 04:40

    This issue is sort of resolved as of pandas=0.25. However mean can only currently be applied to a datetime series and not a datetime series within a DataFrame.

    In [1]: import pandas as pd
    
    In [2]: s = pd.Series([pd.datetime(2014, 7, 9), 
       ...:            pd.datetime(2014, 7, 10), 
       ...:            pd.datetime(2014, 7, 11)])
    
    In [3]: s.mean()
    Out[3]: Timestamp('2014-07-10 00:00:00')
    

    Applying .mean() to a DataFrame containing a datetime series returns the same result as shown in the original question.

    In [4]: df = pd.DataFrame({'numeric':[1,2,3],
       ...:               'datetime':s})
    
    In [5]: df.mean()
    Out[5]: 
    numeric    2.0
    dtype: float64
    
    0 讨论(0)
  • 2020-12-11 04:51

    You can use datetime.timedelta

    import functools
    import operator
    import datetime
    
    import pandas as pd
    
    d={'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' :pd.Series([datetime.datetime(2014, 7, 9) , datetime.datetime(2014, 7, 10) , datetime.datetime(2014, 7, 11) ], index=['a', 'b', 'c'])}
    df = pd.DataFrame(d)
    
    def avg_datetime(series):
        dt_min = series.min()
        deltas = [x-dt_min for x in series]
        return dt_min + functools.reduce(operator.add, deltas) / len(deltas)
    
    print(avg_datetime(df['two']))
    
    0 讨论(0)
提交回复
热议问题