Select row prior to first occurrence of an event by group

依然范特西╮ 提交于 2019-12-11 00:01:35

问题


I have a series of observations that describe if and when an animal is spotted in a specific area. The following sample table identifies when a certain animal is seen (status == 1) or not (status == 0) by day.

   id       date status
1   1 2014-06-20      1
2   1 2014-06-21      1
3   1 2014-06-22      1
4   1 2014-06-23      1
5   1 2014-06-24      0
6   2 2014-06-20      1
7   2 2014-06-21      1
8   2 2014-06-22      0
9   2 2014-06-23      1
10  2 2014-06-24      1
11  3 2014-06-20      1
12  3 2014-06-21      1
13  3 2014-06-22      0
14  3 2014-06-23      1
15  3 2014-06-24      0
16  4 2014-06-20      1
17  4 2014-06-21      0
18  4 2014-06-22      0
19  4 2014-06-23      0
20  4 2014-06-24      1

Using the data.table package, I can identify the first day an animal is no longer seen in the area:

library(data.table)
dt <- as.data.table(df)
dt[status == 0, .SD[1], by = id]
  id       date status
1:  1 2014-06-24      0
2:  2 2014-06-22      0
3:  3 2014-06-22      0
4:  4 2014-06-21      0

While the above table is useful, I would like to know how to manipulate the function to find the dates prior to first occurrence of an animal's absence. In other words, I want to know the last day that each animal is in the area before temporarily leaving.

My actual data set bins these presence/absence observations into different time lengths depending on the situation (e.g. presence/absence by 3-hour intervals, 6-hour, etc). Therefore, it would be easier to access the previous row rather than subtract the time interval from each value since it always changes. My desired output would be the following:

  id       date status
1:  1 2014-06-23      1
2:  2 2014-06-21      1
3:  3 2014-06-21      1
4:  4 2014-06-20      1

Please feel free to use base code or other packages (i.e. dplyr) to answer this question, I am always up for something new. Thank you for your time!


回答1:


Try the following:

dt[dt[status == 0, .I[1] - 1, by = id]$V1]
#   id       date status
#1:  1 2014-06-23      1
#2:  2 2014-06-21      1
#3:  3 2014-06-21      1
#4:  4 2014-06-20      1

Incidentally, this method (using .I instead of .SD) will also be much faster. See this post for more on that.




回答2:


Here is a method via dplyr :

df %>% 
  group_by(id) %>%
  mutate(status_change = status - lead(status)) %>%
  filter(status_change == 1)
  id       date status status_change
1  1 2014-06-23      1             1
2  2 2014-06-21      1             1
3  3 2014-06-21      1             1
4  3 2014-06-23      1             1
5  4 2014-06-20      1             1

This takes advantage of status being a numeric variable. lead() accesses the next value; the change is 1 when and animal disappears.



来源:https://stackoverflow.com/questions/24441367/select-row-prior-to-first-occurrence-of-an-event-by-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!