Datetime objects with pandas mean function

一个人想着一个人 提交于 2019-12-04 00:52:24

问题


I am new to programming so I apologize in advance if this question does not make any sens. I noticed that when I try to calculate the mean value of a pandas data frame with a date time object formatted like this: datetime.datetime(2014, 7, 10), it can not calculate the mean value of it however it seems to be able to calculate the minimum and maximum value of that same data frame with out a problem.

d={'one' : Series([1, 2, 3], index=['a', 'b', 'c']), 'two' :Series([datetime.datetime(2014, 7, 9) , datetime.datetime(2014, 7, 10) , datetime.datetime(2014, 7, 11) ], index=['a', 'b', 'c'])}
df=pd.DataFrame(d)

df
Out[18]: 
      one        two    
   a    1 2014-07-09
   b    2 2014-07-10
   c    3 2014-07-11

df.min()
Out[19]: 
   one             1
   two    2014-07-09
dtype: object

df.mean()
Out[20]: 
   one    2
dtype: float64

I did notice that the min and the max function converted all the columns to objects, where as the mean function only outputs floats. Could anyone explain to me why the mean function can only handle floats? Is there another way I to get the mean values of a data frame with a date time object? I can work around it by using epoch time (as integer), but it would be very convenient if there was a direct way. I use Python 2.7

I am grateful for any hints.


回答1:


You can use datetime.timedelta

import functools
import operator

d={'one' : Series([1, 2, 3], index=['a', 'b', 'c']), 'two' :Series([datetime.datetime(2014, 7, 9) , datetime.datetime(2014, 7, 10) , datetime.datetime(2014, 7, 11) ], index=['a', 'b', 'c'])}
df = pd.DataFrame(d)

def avg_datetime(series):
    dt_min = series.min()
    deltas = [x-dt_min for x in series]
    return dt_min + functools.reduce(operator.add, deltas) / len(deltas)

print(avg_datetime(df['two']))



回答2:


To simplify Alex's answer (I would have added this as a comment but I don't have sufficient reputation):

import datetime
import pandas as pd

d={'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two': pd.Series([datetime.datetime(2014, 7, 9), 
           datetime.datetime(2014, 7, 10), 
           datetime.datetime(2014, 7, 11) ], 
           index=['a', 'b', 'c'])}
df = pd.DataFrame(d)

Which looks like:

   one   two
a   1   2014-07-09
b   2   2014-07-10
c   3   2014-07-11

Then calculate the mean of column "two" by:

(df.two - df.two.min()).mean() + df.two.min()

So, subtract the min of the timeseries, calculate the mean (or median) of the resulting timedeltas, and add back the min.




回答3:


This issue is sort of resolved as of pandas=0.25. However mean can only currently be applied to a datetime series and not a datetime series within a DataFrame.

In [1]: import pandas as pd

In [2]: s = pd.Series([pd.datetime(2014, 7, 9), 
   ...:            pd.datetime(2014, 7, 10), 
   ...:            pd.datetime(2014, 7, 11)])

In [3]: s.mean()
Out[3]: Timestamp('2014-07-10 00:00:00')

Applying .mean() to a DataFrame containing a datetime series returns the same result as shown in the original question.

In [4]: df = pd.DataFrame({'numeric':[1,2,3],
   ...:               'datetime':s})

In [5]: df.mean()
Out[5]: 
numeric    2.0
dtype: float64


来源:https://stackoverflow.com/questions/27907902/datetime-objects-with-pandas-mean-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!