Pandas Downsampling Issue

问题

I have a csv file with two columns containing dates and 0 or 1 like so:

17/08/2012 07:47:16 0
17/08/2012 07:54:31 1
17/08/2012 08:02:31 0
17/08/2012 09:22:33 0
17/08/2012 09:58:05 0
17/08/2012 12:26:59 1
17/08/2012 20:56:00 0
18/08/2012 10:04:06 0
18/08/2012 10:42:52 0
20/08/2012 07:22:02 0
20/08/2012 07:54:28 0
20/08/2012 08:01:58 0
20/08/2012 08:16:31 1
20/08/2012 08:26:38 0
20/08/2012 08:55:19 1
20/08/2012 09:00:09 0 
20/08/2012 09:26:11 0
20/08/2012 09:50:10 0
20/08/2012 10:33:37 0
20/08/2012 10:39:13 0
20/08/2012 10:39:35 1
20/08/2012 11:15:07 1
20/08/2012 11:19:15 0
20/08/2012 11:21:01 0

I load this file into a DataFrame raw_data and then change the index to Timestamp :

ts_data=raw_data.set_index(pd.to_datetime(raw_data.when_created,dayfirst=True))

I then try to downsample the data using:

daily_conversions=ts_data.resample('D',how='sum')

It works for all days (more than 7 months ,here i only include a subset) except one day where i get this output:

2012-08-20 NaN

This does not make sense as you can see from the data. The interesting part is that if i downsample using a higher frequency like 'h' i get correct results for that specific day.I get null-values for the hours that are not present 0 for the hourse that are present but only have 0 and a correct sum for the hours that are present but are ==1. Any ideas please?

回答1:

After a helpful comment from above i realised what was wrong. It is just a matter of labelling. So in reality the date that should return NaN is the 19th but the default setting is label='right' so it was showing as the 20th. When i add label='left' it works fine. Thanks

来源：https://stackoverflow.com/questions/15821194/pandas-downsampling-issue

标签

python

pandas

downsampling