Aggregations for Timedelta values in the Python DataFrame

百般思念 提交于 2019-12-12 18:44:12

问题


I have big DataFrame (df) which looks like:

  Acc_num date_diff
0   29  0:04:43
1   29  0:01:43
2   29  2:22:45
3   29  0:16:21
4   29  0:58:20
5   30  0:00:35
6   34  7:15:26
7   34  4:40:01
8   34  0:56:02
9   34  6:53:44
10  34  1:36:58
.....
Acc_num                    int64
date_diff        timedelta64[ns]
dtype: object

I need to calculate 'date_diff' mean (in timedelta format) for each account number.
df.date_diff.mean() works correctly. But when I try next:
df.groupby('Acc_num').date_diff.mean() it raises an exception:

"DataError: No numeric types to aggregate"

I also tried df.pivot_table() method, but didn't acheive anything.

Could someone help me with this stuff. Thank you in advance!


回答1:


Weird limitation indeed. But a simple solution would be:

df.groupby('Acc_num').date_diff.agg(lambda g:g.sum()/g.count())

Edit:
Pandas will actually attempt to aggregate non-numeric columns if you pass numeric_only=False

df.groupby('Acc_num').date_diff.mean(numeric_only=False)


来源:https://stackoverflow.com/questions/45239742/aggregations-for-timedelta-values-in-the-python-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!