R: Selecting first of n consecutive rows above a certain threshold value

后端 未结 4 1185
迷失自我
迷失自我 2020-12-10 20:34

I have a data frame with MRN, dates, and a test value.

I need to select all the first rows per MRN that have three

4条回答
  •  一整个雨季
    2020-12-10 21:06

    The easiest way is to use the zoo library in conjunction with dplyr. Within the zoo package there is a function called rollapply, we can use this to calculate a function value for a window of time.

    In this example, we could apply the window to calculate the minimum of the next three values, and then apply the logic specified.

    df %>% group_by(MRN) %>%
      mutate(ANC=rollapply(ANC, width=3, min, align="left", fill=NA, na.rm=TRUE)) %>%
      filter(ANC >= 0.5) %>%  
      filter(row_number() == 1)
    
    #   MRN Collected_Date   ANC
    # 1 001     2015-01-03 0.532
    # 2 004     2014-01-03 0.500
    

    In the code above we have used rollapply to calculate the minimum of the next 3 items. To see how this works compare the following:

    rollapply(1:6, width=3, min, align="left", fill=NA) # [1]  1  2  3  4 NA NA
    rollapply(1:6, width=3, min, align="center", fill=NA) # [1] NA  1  2  3  4 NA
    rollapply(1:6, width=3, min, align="right", fill=NA) # [1] NA NA  1  2  3  4
    

    So in our example, we have aligned from the left, so it starts from the current location and looks forward to the next 2 values.

    Lastly we filter by the appropriate values, and take the first observation of each group.

提交回复
热议问题