computing the mean for python datetime

好久不见. 提交于 2019-11-29 14:40:10

You can take the mean of Timedelta. So find the minimum value and subtract it from the series to get a series of Timedelta. Then take the mean and add it back to the minimum.

dob = df_test.DOB
m = dob.min()
(m + (dob - m).mean()).to_pydatetime()

datetime.datetime(2014, 7, 12, 0, 0)

One-line

df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(d.min())).to_pydatetime()

To @ALollz point

I use the epoch pd.Timestamp(0) instead of min

df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(pd.Timestamp(0))).to_pydatetime()

You can convert epoch time using astype with np.int64 and converting back to datetime with pd.to_datetime:

pd.to_datetime(df_test.DOB.dropna().astype(np.int64).mean())

Output:

Timestamp('2014-07-12 00:00:00')

You could work with unix time if you want. This is defined as the total number of seconds (for instance) since 1970-01-01. With that, all of your times are simply floats, so it's very easy to do simple math on the columns.

import pandas as pd

df_test['unix_time'] = (df_test.DOB - pd.to_datetime('1970-01-01')).dt.total_seconds()

df_test['unix_time'].mean()
#1405123200.0

# You want it in date, so just convert back
pd.to_datetime(df_test['unix_time'].mean(), origin='unix', unit='s')
#Timestamp('2014-07-12 00:00:00')

Datetime math supports some standard operations:

a = datetime.datetime(2014, 7, 9)
b = datetime.datetime(2014, 7, 15)
c = (b - a)/2

# here c will be datetime.timedelta(3)

a + c
Out[7]: datetime.datetime(2014, 7, 12, 0, 0)

So you can write a function that, given two datetimes, subtracts the lesser form the greater and adds half of the difference to the lesser. Apply this function to your dataframe, and shazam!

As of pandas=0.25, it is possible to compute the mean of a datetime series.

In [1]: import pandas as pd
   ...: import numpy as np

In [2]: s = pd.Series([
   ...:     pd.datetime(2014, 7, 9),
   ...:     pd.datetime(2014, 7, 15),
   ...:     np.datetime64('NaT')])

In [3]: s.mean()
Out[3]: Timestamp('2014-07-12 00:00:00')

However, note that applying mean to a pandas dataframe currently ignores columns with a datetime series.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!