Converting irregularly time stamped measurements into equally spaced, time-weighted averages

后端未结

关注

 3  2060

I have series of measurements which are time stamped and irregularly spaced. Values in these series always represent changes of the measurement -- i.e. without a change no n

相关标签:

3条回答

栀梦

2020-12-30 04:23
This is not an answer, but I need some graph to determine what is the time-weighted averaging mean. Here is a graph that plot with your data:

Do you want the average value of every vertical span? The first span is 0-1, since it include unknown data, the result is NaN. The second block is 1-2, the value is calculated by: ( 10*0.2 + 8*0.4 + 0*0.4) which is the same as yours. But I don't know how the value of 5-6 cames:
```
23:00:06     2.8 ( 0*0.3 + 2*0.7 )
```
Can you explain how do you calculate this value?
0 讨论(0)
发布评论:

提交评论
- 加载中...

轻奢々

2020-12-30 04:26

You can do this with traces.

from datetime import datetime
import traces

ts = traces.TimeSeries(data=[
    (datetime(2016, 9, 27, 23, 0, 0, 100000), 10),
    (datetime(2016, 9, 27, 23, 0, 1, 200000), 8),
    (datetime(2016, 9, 27, 23, 0, 1, 600000), 0),
    (datetime(2016, 9, 27, 23, 0, 6, 300000), 4),
])

regularized = ts.moving_average(
    start=datetime(2016, 9, 27, 23, 0, 1),
    sampling_period=1,
    placement='left',
)

Which results in :

[(datetime(2016, 9, 27, 23, 0, 1), 5.2),
 (datetime(2016, 9, 27, 23, 0, 2), 0.0),
 (datetime(2016, 9, 27, 23, 0, 3), 0.0),
 (datetime(2016, 9, 27, 23, 0, 4), 0.0),
 (datetime(2016, 9, 27, 23, 0, 5), 0.0),
 (datetime(2016, 9, 27, 23, 0, 6), 2.8)]

0 讨论(0)

情话喂你

2020-12-30 04:46

Here's a go at a solution, it may need some tweaking to meet your requirements.

Add the seconds to your index and fill forwards:

tees = pd.Index(datetime(2000, 1, 1, 23, 0, n) for n in xrange(8))
df2 = df1.reindex(df1.index + tees)
df2['value'] = df2.value.ffill()

In [14]: df2
Out[14]:
                            value
2000-01-01 23:00:00           NaN
2000-01-01 23:00:00.100000     10
2000-01-01 23:00:01            10
2000-01-01 23:00:01.200000      8
2000-01-01 23:00:01.600000      0
2000-01-01 23:00:02             0
2000-01-01 23:00:03             0
2000-01-01 23:00:04             0
2000-01-01 23:00:05             0
2000-01-01 23:00:06             0
2000-01-01 23:00:06.300000      4
2000-01-01 23:00:07             4

Take the time difference (using shift) til the next value, and multiply (value * seconds):

df3['difference'] = df3['index'].shift(-1) - df3['index']
df3['tot'] = df3.apply(lambda row: np.nan
                                   if row['difference'].seconds > 2  # a not very robust check for NaT
                                   else row['difference'].microseconds * row['value'] / 1000000,
                        axis=1)

In [17]: df3
Out[17]:
                        index  value      difference  tot
0         2000-01-01 23:00:00    NaN 00:00:00.100000  NaN
1  2000-01-01 23:00:00.100000     10 00:00:00.900000  9.0
2         2000-01-01 23:00:01     10 00:00:00.200000  2.0
3  2000-01-01 23:00:01.200000      8 00:00:00.400000  3.2
4  2000-01-01 23:00:01.600000      0 00:00:00.400000  0.0
5         2000-01-01 23:00:02      0        00:00:01  0.0
6         2000-01-01 23:00:03      0        00:00:01  0.0
7         2000-01-01 23:00:04      0        00:00:01  0.0
8         2000-01-01 23:00:05      0        00:00:01  0.0
9         2000-01-01 23:00:06      0 00:00:00.300000  0.0
10 2000-01-01 23:00:06.300000      4 00:00:00.700000  2.8
11        2000-01-01 23:00:07      4             NaT  NaN

Then do the resample to seconds (sum the value*seconds):

In [18]: df3.set_index('index')['tot'].resample('S', how='sum')
Out[18]:
index
2000-01-01 23:00:00    9.0
2000-01-01 23:00:01    5.2
2000-01-01 23:00:02    0.0
2000-01-01 23:00:03    0.0
2000-01-01 23:00:04    0.0
2000-01-01 23:00:05    0.0
2000-01-01 23:00:06    2.8
2000-01-01 23:00:07    NaN
Freq: S, dtype: float64

Note: The end point need some coercing (sum is being clever and ignoring the NaN)...

0 讨论(0)