Converting irregularly time stamped measurements into equally spaced, time-weighted averages

后端 未结 3 2060
悲哀的现实
悲哀的现实 2020-12-30 03:45

I have series of measurements which are time stamped and irregularly spaced. Values in these series always represent changes of the measurement -- i.e. without a change no n

相关标签:
3条回答
  • 2020-12-30 04:23

    This is not an answer, but I need some graph to determine what is the time-weighted averaging mean. Here is a graph that plot with your data:

    enter image description here

    Do you want the average value of every vertical span? The first span is 0-1, since it include unknown data, the result is NaN. The second block is 1-2, the value is calculated by: ( 10*0.2 + 8*0.4 + 0*0.4) which is the same as yours. But I don't know how the value of 5-6 cames:

    23:00:06     2.8 ( 0*0.3 + 2*0.7 )
    

    Can you explain how do you calculate this value?

    0 讨论(0)
  • 2020-12-30 04:26

    You can do this with traces.

    from datetime import datetime
    import traces
    
    ts = traces.TimeSeries(data=[
        (datetime(2016, 9, 27, 23, 0, 0, 100000), 10),
        (datetime(2016, 9, 27, 23, 0, 1, 200000), 8),
        (datetime(2016, 9, 27, 23, 0, 1, 600000), 0),
        (datetime(2016, 9, 27, 23, 0, 6, 300000), 4),
    ])
    
    regularized = ts.moving_average(
        start=datetime(2016, 9, 27, 23, 0, 1),
        sampling_period=1,
        placement='left',
    )
    

    Which results in :

    [(datetime(2016, 9, 27, 23, 0, 1), 5.2),
     (datetime(2016, 9, 27, 23, 0, 2), 0.0),
     (datetime(2016, 9, 27, 23, 0, 3), 0.0),
     (datetime(2016, 9, 27, 23, 0, 4), 0.0),
     (datetime(2016, 9, 27, 23, 0, 5), 0.0),
     (datetime(2016, 9, 27, 23, 0, 6), 2.8)]
    
    0 讨论(0)
  • 2020-12-30 04:46

    Here's a go at a solution, it may need some tweaking to meet your requirements.

    Add the seconds to your index and fill forwards:

    tees = pd.Index(datetime(2000, 1, 1, 23, 0, n) for n in xrange(8))
    df2 = df1.reindex(df1.index + tees)
    df2['value'] = df2.value.ffill()
    
    In [14]: df2
    Out[14]:
                                value
    2000-01-01 23:00:00           NaN
    2000-01-01 23:00:00.100000     10
    2000-01-01 23:00:01            10
    2000-01-01 23:00:01.200000      8
    2000-01-01 23:00:01.600000      0
    2000-01-01 23:00:02             0
    2000-01-01 23:00:03             0
    2000-01-01 23:00:04             0
    2000-01-01 23:00:05             0
    2000-01-01 23:00:06             0
    2000-01-01 23:00:06.300000      4
    2000-01-01 23:00:07             4
    

    Take the time difference (using shift) til the next value, and multiply (value * seconds):

    df3['difference'] = df3['index'].shift(-1) - df3['index']
    df3['tot'] = df3.apply(lambda row: np.nan
                                       if row['difference'].seconds > 2  # a not very robust check for NaT
                                       else row['difference'].microseconds * row['value'] / 1000000,
                            axis=1)
    
    In [17]: df3
    Out[17]:
                            index  value      difference  tot
    0         2000-01-01 23:00:00    NaN 00:00:00.100000  NaN
    1  2000-01-01 23:00:00.100000     10 00:00:00.900000  9.0
    2         2000-01-01 23:00:01     10 00:00:00.200000  2.0
    3  2000-01-01 23:00:01.200000      8 00:00:00.400000  3.2
    4  2000-01-01 23:00:01.600000      0 00:00:00.400000  0.0
    5         2000-01-01 23:00:02      0        00:00:01  0.0
    6         2000-01-01 23:00:03      0        00:00:01  0.0
    7         2000-01-01 23:00:04      0        00:00:01  0.0
    8         2000-01-01 23:00:05      0        00:00:01  0.0
    9         2000-01-01 23:00:06      0 00:00:00.300000  0.0
    10 2000-01-01 23:00:06.300000      4 00:00:00.700000  2.8
    11        2000-01-01 23:00:07      4             NaT  NaN
    

    Then do the resample to seconds (sum the value*seconds):

    In [18]: df3.set_index('index')['tot'].resample('S', how='sum')
    Out[18]:
    index
    2000-01-01 23:00:00    9.0
    2000-01-01 23:00:01    5.2
    2000-01-01 23:00:02    0.0
    2000-01-01 23:00:03    0.0
    2000-01-01 23:00:04    0.0
    2000-01-01 23:00:05    0.0
    2000-01-01 23:00:06    2.8
    2000-01-01 23:00:07    NaN
    Freq: S, dtype: float64
    

    Note: The end point need some coercing (sum is being clever and ignoring the NaN)...

    0 讨论(0)
提交回复
热议问题