Calculate Run Length Sequence and Maximum by Subject ID

喜你入骨 提交于 2021-02-20 04:36:12

问题


We have time series data in which repeated observations were measured for several subjects. I would like to calculate the number of occasions in which the variable positive == 1 occurs for each subject (variable id).

A second aim is to identify the maximum length of these runs of consecutive observations in which positive == 1. For each subject there are likely to be multiple runs within the study period. Rather than calculating the maximum number of consecutive positive observations per subject, I would like to calculate the maximum run length within an individual run.

Here is a toy data set that illustrates the problem:

set.seed(1234)
test <- data.frame(id = rep(1:3, each = 10), positive = round(runif(30,0,1)))
test$run <- sequence(rle(test$positive)$lengths)
test$run_positive <- ifelse(test$positive == '0', '0', test$run)
test$episode <- ifelse(test$run_positive == '1', '1', '0')

count(test$episode)
  x freq
1 0   25
2 1    5

The code above gets close to answering my first question in which I am attempting to count the number of positive episodes, however it is not conditioned by subject. This has the unfortunate effect of counting the last observation of Subject #1 and the first observation of Subject #2 in the same run. Can anyone help me develop code to condition this run length encoding by subject?

Secondly, how can one extract only the maximum run length for each run in which positive == 1? I would like to add an additional column in which only the observations in which the maximum run length are recorded. For Subject #1, this would look like:

   id positive run run_positive episode max_run
1   1        0   1            0       0       0
2   1        1   1            1       1       0
3   1        1   2            2       0       0
4   1        1   3            3       0       0
5   1        1   4            4       0       0
6   1        1   5            5       0       5
7   1        0   1            0       0       0
8   1        0   2            0       0       0
9   1        1   1            1       1       0
10  1        1   2            2       0       2

If anyone can come up with a method to do this I would be extremely grateful.


回答1:


I think this answers your first question:

aggregate(positive ~ id, data = test, FUN = sum)

  id positive
1  1        7
2  2        4
3  3        4

This might answer your second question, but I would need to see the desired result for each id to check:

set.seed(1234)
test <- data.frame(id = rep(1:3, each = 10), positive = round(runif(30,0,1)))
test$run <- sequence(rle(test$positive)$lengths)
test$run_positive <- ifelse(test$positive == '0', '0', test$run)
test$episode <- ifelse(test$run_positive == '1', '1', '0')

test$group <- paste(test$id*10, test$positive, sep='')

my.seq <- data.frame(rle(test$group)$lengths)
test$first <- unlist(apply(my.seq, 1, function(x) seq(1,x)))
test$last  <- unlist(apply(my.seq, 1, function(x) seq(x,1,-1)))

test$max <- ifelse(test$last == 1 & test$positive==1, test$run, 0)
test

   id positive run run_positive episode group first last max
1   1        0   1            0       0   100     1    1   0
2   1        1   1            1       1   101     1    5   0
3   1        1   2            2       0   101     2    4   0
4   1        1   3            3       0   101     3    3   0
5   1        1   4            4       0   101     4    2   0
6   1        1   5            5       0   101     5    1   5
7   1        0   1            0       0   100     1    2   0
8   1        0   2            0       0   100     2    1   0
9   1        1   1            1       1   101     1    2   0
10  1        1   2            2       0   101     2    1   2
11  2        1   3            3       0   201     1    2   0
12  2        1   4            4       0   201     2    1   4
13  2        0   1            0       0   200     1    1   0
14  2        1   1            1       1   201     1    1   1
15  2        0   1            0       0   200     1    1   0
16  2        1   1            1       1   201     1    1   1
17  2        0   1            0       0   200     1    4   0
18  2        0   2            0       0   200     2    3   0
19  2        0   3            0       0   200     3    2   0
20  2        0   4            0       0   200     4    1   0
21  3        0   5            0       0   300     1    5   0
22  3        0   6            0       0   300     2    4   0
23  3        0   7            0       0   300     3    3   0
24  3        0   8            0       0   300     4    2   0
25  3        0   9            0       0   300     5    1   0
26  3        1   1            1       1   301     1    4   0
27  3        1   2            2       0   301     2    3   0
28  3        1   3            3       0   301     3    2   0
29  3        1   4            4       0   301     4    1   4
30  3        0   1            0       0   300     1    1   0


来源:https://stackoverflow.com/questions/18669123/calculate-run-length-sequence-and-maximum-by-subject-id

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!