Select rows before a filtered row using dplyr

て烟熏妆下的殇ゞ 提交于 2019-12-11 02:03:39

问题


I'm working on a study where we used a camera placed inside a nest box to determine when our study species laid its first egg. Some of the cameras weren't super reliable, and I'd like to see if there were continuous photos before the date where the first egg was laid. This way I can no for sure that this is the first egg date. There are >165,000 photos and >200 nests, so I grouped by nest box ID, filtered the rows down to those that have at least 1 egg, and then used the slice function to select the first row with data. Here's a reproducible example:

example <- structure(list(boxID = c("CA10", "CA10", "CA10", "CA10", "CA10", 
    "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", 
    "CA10", "CA10"), visitType = c("Image", "Image", "Image", "Image", 
    "Image", "Image", "Image", "Image", "Image", "Image", "Image", 
    "Image", "Image", "Image", "Image"), day = c(25L, 25L, 25L, 26L, 
    26L, 26L, 27L, 27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L), month = c("MAR", 
    "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", 
    "MAR", "MAR", "MAR", "MAR", "MAR"), year = c(2018, 2018, 2018, 
    2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
    2018), timeChecked = c("02:59", "09:06", "15:13", "02:59", "09:07", 
    "15:14", "02:59", "09:07", "15:13", "02:58", "09:06", "15:12", 
    "02:58", "09:06", "15:12"), species = c("Empty", "Empty", "Empty", 
    "Empty", "Empty", "Empty", "Empty", "Empty", "American Kestrel", 
    "Empty", "American Kestrel", "American Kestrel", "American Kestrel", 
    "American Kestrel", "American Kestrel"), sexAdult = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, "Female", NA, "Female", "Female", "Female", 
    NA, NA), numEggs = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    "1", "1", "1", "1", "1"), numNestlings = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), date = structure(c(17615, 
    17615, 17615, 17616, 17616, 17616, 17617, 17617, 17617, 17618, 
    17618, 17618, 17619, 17619, 17619), class = "Date")), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -15L), .Names = c("boxID", 
    "visitType", "day", "month", "year", "timeChecked", "species", 
    "sexAdult", "numEggs", "numNestlings", "date"))

Here's the code I have to find the first row that has at least 1 egg:

example %>%
      mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>% # remove unknowns and other values that should be NA
      select(boxID, date, numEggs, visitType) %>%
      group_by(boxID) %>%
      filter(numEggs > 0) %>%
      slice(1) 

I'd like to look at the 5 or 10 rows before this first row with an egg to make sure there was continuous data up to this point in time. Is there a way to do this row indexing with slice or some other dplyr function?


回答1:


Here's one way. match returns the position of first numEggs > 0 and then we simply get additional n_previous rows from that position. We use max(1, ...) so that we don't get error if position of first numEggs > 0 < n_previous .

n_previous <- 5

example %>%
  mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>% 
  select(boxID, date, numEggs, visitType) %>%
  group_by(boxID) %>%
  slice(max(1, match(TRUE, numEggs > 0) - n_previous):match(TRUE, numEggs > 0))

# A tibble: 6 x 4
# Groups:   boxID [1]
  boxID date       numEggs visitType
  <chr> <date>     <chr>   <chr>    
1 CA10  2018-03-26 <NA>    Image    
2 CA10  2018-03-27 <NA>    Image    
3 CA10  2018-03-27 <NA>    Image    
4 CA10  2018-03-27 <NA>    Image    
5 CA10  2018-03-28 <NA>    Image    
6 CA10  2018-03-28 1       Image  



回答2:


Here a way based on slicing according to the position of the first non missing value of numEggs. You can modify the 5 value inside the last row according to the rows you want retain before the first non-NA numEggs

example %>%
      mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>%
      select(boxID, date, numEggs, visitType) %>%
      group_by(boxID) %>%
      slice((min(which(!is.na(numEggs)))-5):min(which(!is.na(numEggs))))

    # A tibble: 6 x 4
# Groups:   boxID [1]
  boxID date       numEggs visitType
  <chr> <date>     <chr>   <chr>    
1 CA10  2018-03-26 NA      Image    
2 CA10  2018-03-27 NA      Image    
3 CA10  2018-03-27 NA      Image    
4 CA10  2018-03-27 NA      Image    
5 CA10  2018-03-28 NA      Image    
6 CA10  2018-03-28 1       Image 


来源:https://stackoverflow.com/questions/53622509/select-rows-before-a-filtered-row-using-dplyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!