I\'m using a logistic exposure to calculate hatching success for bird nests. My data set is quite extensive and I have ~2,000 nests, each with a unique ID (\"ClutchID). I
Collecting some of the comments...
dplyr
We need only the dplyr
package for this problem. If we load other packages, e.g. plyr
, it can cause conflicts if both packages have functions with the same name. Let's load only dplyr
.
library(dplyr)
In the future, you may wish to load tidyverse
instead -- it includes dplyr
and other related packages, for graphics, etc.
Let's convert the DateVisit
variable from character strings to something R can interpret as a date. Once we do this, it allows R to calculate differences in days by subtracting two dates from each other.
HS_Hatch <- HS_Hatch %>%
mutate(date_visit = as.Date(DateVisit, "%m/%d/%Y"))
The date format %m/%d/%Y
is different from your original code. This date format needs to match how dates look in your data. DateVisit
has dates as month/day/year, so we use %m/%d/%Y
.
Also, you don't need to specify the dataset for DateVisit
inside mutate
, as in HS_Hatch$DateVisit
, because it's already looking in HS_Hatch
. The code HS_Hatch %>% ...
says 'use HS_Hatch
for the following steps'.
To calculate exposure, we need to find the first date, last date, and then the difference between the two, for each set of rows by ClutchID
. We use summarize
, which collapses the data to one row per ClutchID
.
exposure <- HS_Hatch %>%
group_by(ClutchID) %>%
summarize(first_visit = min(date_visit),
last_visit = max(date_visit),
exposure = last_visit - first_visit)
first_visit = min(date_visit)
will find the minimum date_visit
for each ClutchID
separately, since we are using group_by(ClutchID)
.
exposure = last_visit - first_visit
takes the newly-calculated first_visit
and last_visit
and finds the difference in days.
This creates the following result:
ClutchID first_visit last_visit exposure
<int> <date> <date> <dbl>
1 1 2012-03-15 2012-04-03 19
2 2 2012-03-18 2012-04-04 17
3 3 2012-03-22 2012-04-04 13
4 4 2012-03-18 2012-04-04 17
5 5 2012-03-20 2012-04-05 16
If you want to keep all the original rows, you can use mutate
in place of summarize
.
Here is a similar solutions if you look for a difftime results in days, from a vector date
, without NA values produce in the new column, and if you expect to group by several conditions/groups.
make sure that your vector of date as been converting in the good format as previously explained.
dat2 <- dat %>%
select(group1, group2, date) %>%
arrange(group1, group2, date) %>%
group_by(group1, group2) %>%
mutate(diff_date = c(0,diff(date)))