I have a rather big dataframe with a column of POSIXct datetimes (~10yr of hourly data). I would flag all the rows in which the day falls in a Daylight saving period. For ex
As @beetroot points out in the comments, you can accomplish this with a join:
limits = span %>%
group_by(YEAR) %>%
summarise(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS])) %>%
inner_join(span, by='YEAR')
# YEAR minDOY maxDOY date DOY DLS
# 1 2000 93 303 2000-01-01 00:00:00 1 FALSE
# 2 2000 93 303 2000-01-01 01:00:00 1 FALSE
# 3 2000 93 303 2000-01-01 02:00:00 1 FALSE
# 4 2000 93 303 2000-01-01 03:00:00 1 FALSE
# 5 2000 93 303 2000-01-01 04:00:00 1 FALSE
# 6 2000 93 303 2000-01-01 05:00:00 1 FALSE
# 7 2000 93 303 2000-01-01 06:00:00 1 FALSE
# 8 2000 93 303 2000-01-01 07:00:00 1 FALSE
# 9 2000 93 303 2000-01-01 08:00:00 1 FALSE
# 10 2000 93 303 2000-01-01 09:00:00 1 FALSE
dplyr is a great tool, but in this case I'm not sure it's the best for the job. This accomplishes your task:
span$CHECK <- ave(dst(span$date), as.Date(span$date, tz = tz(span$date)), FUN = any)
I think ave
is a terrible name for this function, but if you can remember it exists, it's often quite useful when you want to join a summary back to the data.frame it came from.
The best solution to get the job done, as suggested by @aosmith, is.
limits = span %>% group_by(YEAR) %>% mutate(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS]),CHECK=FALSE)
limits$CHECK[(limits2$DOY >= limits$minDOY) & (limits$DOY <= limits$maxDOY) ] = TRUE
The use of the ave function is a good choice, but I personally prefer to stick to the 'dplyr' package.