r calculating rolling average with window based on value (not number of rows or date/time variable)

前端 未结 4 1178
小蘑菇
小蘑菇 2021-01-18 05:55

I\'m quite new to all the packages meant for calculating rolling averages in R and I hope you can show me in the right direction.

I have the following data as an exa

4条回答
  •  长情又很酷
    2021-01-18 06:57

    For the sake of completeness, here is an answer which uses data.table to aggregate in a non-equi join.

    The OP has clarified in comments, that he is looking for a sliding window of 5 ms, i.e., windows that go 300-304, 301-305, 302-306 etc.

    As there is no data point with 302 ms in OP's data set, the missing values need to be filled up.

    library(data.table)
    ws <- 5   # define window size
    setDT(df)[SJ(start = seq(min(ms), max(ms), 1))[, end := start + ws - 1], 
              on = .(ms >= start, ms <= end),
              .(share_correct = mean(correct)), by = .EACHI]
    
         ms  ms share_correct
     1: 300 304     0.4000000
     2: 301 305     0.0000000
     3: 302 306     0.2500000
     4: 303 307     0.2500000
     5: 304 308     0.2500000
     6: 305 309     0.2500000
     7: 306 310     0.2500000
     8: 307 311     0.0000000
     9: 308 312     0.2000000
    10: 309 313     0.2500000
    11: 310 314     0.2000000
    12: 311 315     0.4000000
    13: 312 316     0.4285714
    14: 313 317     0.2857143
    15: 314 318     0.3750000
    16: 315 319     0.4285714
    17: 316 320     0.4285714
    18: 317 321     0.4000000
    19: 318 322     0.4000000
    20: 319 323     0.2500000
    21: 320 324     0.4000000
    22: 321 325     0.3333333
    23: 322 326     0.5000000
    24: 323 327     1.0000000
    25: 324 328     1.0000000
    26: 325 329     0.5000000
    27: 326 330     0.2000000
    28: 327 331     0.2000000
    29: 328 332     0.4285714
    30: 329 333     0.3333333
    31: 330 334     0.2857143
    32: 331 335     0.5000000
    33: 332 336     0.3750000
    34: 333 337     0.2857143
    35: 334 338     0.3000000
    36: 335 339     0.3750000
    37: 336 340     0.3750000
    38: 337 341     0.4285714
    39: 338 342     0.4000000
    40: 339 343     0.4285714
    41: 340 344     0.4285714
    42: 341 345     0.4000000
    43: 342 346     0.5000000
         ms  ms share_correct
    

    If the OP would be interested only in windows where the starting point exist in the dataset the code can be simplified:

    setDT(df)[SJ(start = unique(ms))[, end := start + ws - 1], 
              on = .(ms >= start, ms <= end),
              .(share_correct = mean(correct)), by = .EACHI]
    
         ms  ms share_correct
     1: 300 304     0.4000000
     2: 301 305     0.0000000
     3: 303 307     0.2500000
     4: 305 309     0.2500000
     5: 306 310     0.2500000
     6: 308 312     0.2000000
     7: 310 314     0.2000000
     8: 311 315     0.4000000
     9: 312 316     0.4285714
    10: 314 318     0.3750000
    11: 315 319     0.4285714
    12: 316 320     0.4285714
    13: 317 321     0.4000000
    14: 318 322     0.4000000
    15: 320 324     0.4000000
    16: 321 325     0.3333333
    17: 322 326     0.5000000
    18: 324 328     1.0000000
    19: 328 332     0.4285714
    20: 329 333     0.3333333
    21: 330 334     0.2857143
    22: 332 336     0.3750000
    23: 334 338     0.3000000
    24: 335 339     0.3750000
    25: 336 340     0.3750000
    26: 337 341     0.4285714
    27: 338 342     0.4000000
    28: 340 344     0.4285714
    29: 341 345     0.4000000
    30: 342 346     0.5000000
         ms  ms share_correct
    

    In both cases, a data.table containing the intervals [start, end] is created on the fly and right joined to df. During the non-equi join, the intermediate result is immediately grouped by the join parameters (by = .EACHI) and aggregated. Note that closed intervals are used to be in line with OP's expectations.

提交回复
热议问题