I\'m using a logistic exposure to calculate hatching success for bird nests. My data set is quite extensive and I have ~2,000 nests, each with a unique ID (\"ClutchID). I
Collecting some of the comments...
dplyrWe need only the dplyr package for this problem. If we load other packages, e.g. plyr, it can cause conflicts if both packages have functions with the same name. Let's load only dplyr.
library(dplyr)
In the future, you may wish to load tidyverse instead -- it includes dplyr and other related packages, for graphics, etc.
Let's convert the DateVisit variable from character strings to something R can interpret as a date. Once we do this, it allows R to calculate differences in days by subtracting two dates from each other.
HS_Hatch <- HS_Hatch %>%
mutate(date_visit = as.Date(DateVisit, "%m/%d/%Y"))
The date format %m/%d/%Y is different from your original code. This date format needs to match how dates look in your data. DateVisit has dates as month/day/year, so we use %m/%d/%Y.
Also, you don't need to specify the dataset for DateVisit inside mutate, as in HS_Hatch$DateVisit, because it's already looking in HS_Hatch. The code HS_Hatch %>% ... says 'use HS_Hatch for the following steps'.
To calculate exposure, we need to find the first date, last date, and then the difference between the two, for each set of rows by ClutchID. We use summarize, which collapses the data to one row per ClutchID.
exposure <- HS_Hatch %>%
group_by(ClutchID) %>%
summarize(first_visit = min(date_visit),
last_visit = max(date_visit),
exposure = last_visit - first_visit)
first_visit = min(date_visit) will find the minimum date_visit for each ClutchID separately, since we are using group_by(ClutchID).
exposure = last_visit - first_visit takes the newly-calculated first_visit and last_visit and finds the difference in days.
This creates the following result:
ClutchID first_visit last_visit exposure
<int> <date> <date> <dbl>
1 1 2012-03-15 2012-04-03 19
2 2 2012-03-18 2012-04-04 17
3 3 2012-03-22 2012-04-04 13
4 4 2012-03-18 2012-04-04 17
5 5 2012-03-20 2012-04-05 16
If you want to keep all the original rows, you can use mutate in place of summarize.
Here is a similar solutions if you look for a difftime results in days, from a vector date, without NA values produce in the new column, and if you expect to group by several conditions/groups.
make sure that your vector of date as been converting in the good format as previously explained.
dat2 <- dat %>%
select(group1, group2, date) %>%
arrange(group1, group2, date) %>%
group_by(group1, group2) %>%
mutate(diff_date = c(0,diff(date)))