Resampling with 'how=count' causing problems

匿名 (未验证) 提交于 2019-12-03 03:10:03

问题:

I have a simple pandas dataframe that has measurements at various times:

                     volume t 2013-10-13 02:45:00      17 2013-10-13 05:40:00      38 2013-10-13 09:30:00      29 2013-10-13 11:40:00      25 2013-10-13 12:50:00      11 2013-10-13 15:00:00      17 2013-10-13 17:10:00      15 2013-10-13 18:20:00      12 2013-10-13 20:30:00      20 2013-10-14 03:45:00       9 2013-10-14 06:40:00      30 2013-10-14 09:40:00      43 2013-10-14 11:05:00      10

I'm doing some basic resampling and plotting, such as the daily total volume, which works fine:

df.resample('D',how='sum').head()                 volume t 2013-10-13     184 2013-10-14     209 2013-10-15     197 2013-10-16     309 2013-10-17     317

But for some reason when I try do the total number of entries per day, it returns a a multiindex series instead of a dataframe:

df.resample('D',how='count').head()  2013-10-13  volume     9 2013-10-14  volume     9 2013-10-15  volume     7 2013-10-16  volume     9 2013-10-17  volume    10

I can fix the data so it's easily plotted with a simple unstack call, i.e. df.resample('D',how='count').unstack(), but why does calling resample with how='count' have a different behavior than with how='sum'?

回答1:

It does appear the resample and count leads to some odd behavior in terms of how the resulting dataframe is structured (Well, at least up to 0.13.1). See here for a slightly different but related context: Count and Resampling with a mutli-ndex

You can use the same strategy here:

>>> df                      volume date                        2013-10-13 02:45:00      17 2013-10-13 05:40:00      38 2013-10-13 09:30:00      29 2013-10-13 11:40:00      25 2013-10-13 12:50:00      11 2013-10-13 15:00:00      17 2013-10-13 17:10:00      15 2013-10-13 18:20:00      12 2013-10-13 20:30:00      20 2013-10-14 03:45:00       9 2013-10-14 06:40:00      30 2013-10-14 09:40:00      43 2013-10-14 11:05:00      10

So here is your issue:

>>> df.resample('D',how='count')  2013-10-13  volume    9 2013-10-14  volume    4

You can fix the issue by specifying that count applies to the volume column with a dict in the resample call:

>>> df.resample('D',how={'volume':'count'})              volume date               2013-10-13       9 2013-10-14       4


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!