Calculate Difference between dates by group in R

后端未结

关注

 2  1837

I\'m using a logistic exposure to calculate hatching success for bird nests. My data set is quite extensive and I have ~2,000 nests, each with a unique ID (\"ClutchID). I

相关标签:

2条回答

温柔的废话

2020-12-11 23:28
Collecting some of the comments...

Load dplyr

We need only the dplyr package for this problem. If we load other packages, e.g. plyr, it can cause conflicts if both packages have functions with the same name. Let's load only dplyr.
```
library(dplyr)
```
In the future, you may wish to load tidyverse instead -- it includes dplyr and other related packages, for graphics, etc.

Converting dates

Let's convert the DateVisit variable from character strings to something R can interpret as a date. Once we do this, it allows R to calculate differences in days by subtracting two dates from each other.
```
HS_Hatch <- HS_Hatch %>%
 mutate(date_visit = as.Date(DateVisit, "%m/%d/%Y"))
```
The date format %m/%d/%Y is different from your original code. This date format needs to match how dates look in your data. DateVisit has dates as month/day/year, so we use %m/%d/%Y.

Also, you don't need to specify the dataset for DateVisit inside mutate, as in HS_Hatch$DateVisit, because it's already looking in HS_Hatch. The code HS_Hatch %>% ... says 'use HS_Hatch for the following steps'.

Calculating exposures

To calculate exposure, we need to find the first date, last date, and then the difference between the two, for each set of rows by ClutchID. We use summarize, which collapses the data to one row per ClutchID.
```
exposure <- HS_Hatch %>% 
    group_by(ClutchID) %>%
    summarize(first_visit = min(date_visit), 
              last_visit = max(date_visit), 
              exposure = last_visit - first_visit)
```
first_visit = min(date_visit) will find the minimum date_visit for each ClutchID separately, since we are using group_by(ClutchID).

exposure = last_visit - first_visit takes the newly-calculated first_visit and last_visit and finds the difference in days.

This creates the following result:
```
  ClutchID first_visit last_visit exposure
     <int>      <date>     <date>    <dbl>
1        1  2012-03-15 2012-04-03       19
2        2  2012-03-18 2012-04-04       17
3        3  2012-03-22 2012-04-04       13
4        4  2012-03-18 2012-04-04       17
5        5  2012-03-20 2012-04-05       16
```
If you want to keep all the original rows, you can use mutate in place of summarize.
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-11 23:38
Here is a similar solutions if you look for a difftime results in days, from a vector date, without NA values produce in the new column, and if you expect to group by several conditions/groups.

make sure that your vector of date as been converting in the good format as previously explained.
```
dat2 <- dat %>% 
select(group1, group2, date) %>% 
arrange(group1, group2, date) %>% 
group_by(group1, group2) %>% 
mutate(diff_date = c(0,diff(date)))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

Calculate Difference between dates by group in R

Load dplyr

Converting dates

Calculating exposures

Load `dplyr`