Sum overflow TimeDeltas in Python Pandas

社会主义新天地 提交于 2019-12-08 12:37:27

问题


While trying to sum across timedeltas in pandas, it seems to work for a slice but not the whole column.

>> d.ix[0:100, 'VOID-DAYS'].sum()
Timedelta('2113 days 00:00:00')

>> d['VOID-DAYS'].sum()


ValueError: overflow in timedelta operation

回答1:


If VOID-DAYS represents an integer number of days, convert the Timedeltas into integers:

df['VOID-DAYS'] = df['VOID-DAYS'].dt.days

import numpy as np
import pandas as pd
df = pd.DataFrame({'VOID-DAYS': pd.to_timedelta(np.ones((106752,)), unit='D')})
try:
    print(df['VOID-DAYS'].sum())
except ValueError as err:
    print(err)
    # overflow in timedelta operation


df['VOID-DAYS'] = df['VOID-DAYS'].dt.days
print(df['VOID-DAYS'].sum())
# 106752

If the Timedeltas include seconds or smaller units, then use

df['VOID-DAYS'] = df['VOID-DAYS'].dt.total_seconds()

to convert the value to a float.


Pandas Timedeltas (Series and TimedeltaIndexes) store all timedeltas as ints compatible with NumPy's timedelta64[ns] dtype. This dtype uses 8-byte ints to store the timedelta in nanoseconds.

The largest number of days representable in this format is

In [73]: int(float(np.iinfo(np.int64).max) / (10**9 * 3600 * 24))
Out[73]: 106751

Which is why

In [74]: pd.Series(pd.to_timedelta(np.ones((106752,)), unit='D')).sum()
ValueError: overflow in timedelta operation

raises a ValueError, but

In [75]: pd.Series(pd.to_timedelta(np.ones((106751,)), unit='D')).sum()
Out[75]: Timedelta('106751 days 00:00:00')

does not.



来源:https://stackoverflow.com/questions/36766546/sum-overflow-timedeltas-in-python-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!