Calculate Difference between dates by group in R

后端 未结 2 1832
忘掉有多难
忘掉有多难 2020-12-11 22:46

I\'m using a logistic exposure to calculate hatching success for bird nests. My data set is quite extensive and I have ~2,000 nests, each with a unique ID (\"ClutchID). I

相关标签:
2条回答
  • 2020-12-11 23:28

    Collecting some of the comments...

    Load dplyr

    We need only the dplyr package for this problem. If we load other packages, e.g. plyr, it can cause conflicts if both packages have functions with the same name. Let's load only dplyr.

    library(dplyr)
    

    In the future, you may wish to load tidyverse instead -- it includes dplyr and other related packages, for graphics, etc.

    Converting dates

    Let's convert the DateVisit variable from character strings to something R can interpret as a date. Once we do this, it allows R to calculate differences in days by subtracting two dates from each other.

    HS_Hatch <- HS_Hatch %>%
     mutate(date_visit = as.Date(DateVisit, "%m/%d/%Y"))
    

    The date format %m/%d/%Y is different from your original code. This date format needs to match how dates look in your data. DateVisit has dates as month/day/year, so we use %m/%d/%Y.

    Also, you don't need to specify the dataset for DateVisit inside mutate, as in HS_Hatch$DateVisit, because it's already looking in HS_Hatch. The code HS_Hatch %>% ... says 'use HS_Hatch for the following steps'.

    Calculating exposures

    To calculate exposure, we need to find the first date, last date, and then the difference between the two, for each set of rows by ClutchID. We use summarize, which collapses the data to one row per ClutchID.

    exposure <- HS_Hatch %>% 
        group_by(ClutchID) %>%
        summarize(first_visit = min(date_visit), 
                  last_visit = max(date_visit), 
                  exposure = last_visit - first_visit)
    

    first_visit = min(date_visit) will find the minimum date_visit for each ClutchID separately, since we are using group_by(ClutchID).

    exposure = last_visit - first_visit takes the newly-calculated first_visit and last_visit and finds the difference in days.

    This creates the following result:

      ClutchID first_visit last_visit exposure
         <int>      <date>     <date>    <dbl>
    1        1  2012-03-15 2012-04-03       19
    2        2  2012-03-18 2012-04-04       17
    3        3  2012-03-22 2012-04-04       13
    4        4  2012-03-18 2012-04-04       17
    5        5  2012-03-20 2012-04-05       16
    

    If you want to keep all the original rows, you can use mutate in place of summarize.

    0 讨论(0)
  • 2020-12-11 23:38

    Here is a similar solutions if you look for a difftime results in days, from a vector date, without NA values produce in the new column, and if you expect to group by several conditions/groups.

    make sure that your vector of date as been converting in the good format as previously explained.

    dat2 <- dat %>% 
    select(group1, group2, date) %>% 
    arrange(group1, group2, date) %>% 
    group_by(group1, group2) %>% 
    mutate(diff_date = c(0,diff(date)))
    
    0 讨论(0)
提交回复
热议问题