Pandas timeseries resampling and interpolating together

怎甘沉沦 提交于 2019-11-30 18:23:27

问题


I have timestamped sensor data. Because of technical details, I get data from the sensors at approximately one minute intervals. The data may look like this:

   tstamp               val
0  2016-09-01 00:00:00  57
1  2016-09-01 00:01:00  57
2  2016-09-01 00:02:23  57
3  2016-09-01 00:03:04  57
4  2016-09-01 00:03:58  58
5  2016-09-01 00:05:00  60

Now, essentially, I would be extremely happy if I got all data at the exact minute, but I don't. The only way to conserve the distribution and have data at each minute is to interpolate. For example, between row indexes 1 and 2 there are 83 seconds, and the natural way to get a value at the exact minute is to interpolate between the two rows of data (in this case, it is 57, but that is not the case everywhere).

Right now, my approach is to do the following:

date = pd.to_datetime(df['measurement_tstamp'].iloc[0].date())
ts_d = df['measurement_tstamp'].dt.hour * 60 * 60 +\
       df['measurement_tstamp'].dt.minute * 60 +\
       df['measurement_tstamp'].dt.second
ts_r = np.arange(0, 24*60*60, 60)
data = scipy.interpolate.interp1d(x=ts_d, y=df['speed'].values)(ts_r)
req = pd.Series(data, index=pd.to_timedelta(ts_r, unit='s'))
req.index = date + req.index

But this feels rather drawn out and long to me. There are excellent pandas methods that do resampling, rounding, etc. I have been reading them all day, but it turns out that nothing does interpolation just the way I want it. resample works like a groupby and averages time points that fall together. fillna does interpolation, but not after resample has already altered the data by averaging.

Am I missing something, or is my approach the best there is?


For simplicity, assume that I group the data by day, and by sensor, so only a 24 hour period from one sensor is interpolated at a time.


回答1:


d = df.set_index('tstamp')
t = d.index
r = pd.date_range(t.min().date(), periods=24*60, freq='T')

d.reindex(t.union(r)).interpolate('index').ix[r]


Note, periods=24*60 works on daily data, not on the sample provided in the question. For that sample, periods=6 will work.



来源:https://stackoverflow.com/questions/40034040/pandas-timeseries-resampling-and-interpolating-together

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!