Using the result of summarise (dplyr) to mutate the original dataframe

后端未结

关注

 3  1661

I have a rather big dataframe with a column of POSIXct datetimes (~10yr of hourly data). I would flag all the rows in which the day falls in a Daylight saving period. For ex

相关标签:

3条回答

忘了有多久

2020-12-18 12:58

As @beetroot points out in the comments, you can accomplish this with a join:

limits = span %>% 
   group_by(YEAR) %>% 
   summarise(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS])) %>%
   inner_join(span, by='YEAR')
#    YEAR minDOY maxDOY                date DOY   DLS
# 1  2000     93    303 2000-01-01 00:00:00   1 FALSE
# 2  2000     93    303 2000-01-01 01:00:00   1 FALSE
# 3  2000     93    303 2000-01-01 02:00:00   1 FALSE
# 4  2000     93    303 2000-01-01 03:00:00   1 FALSE
# 5  2000     93    303 2000-01-01 04:00:00   1 FALSE
# 6  2000     93    303 2000-01-01 05:00:00   1 FALSE
# 7  2000     93    303 2000-01-01 06:00:00   1 FALSE
# 8  2000     93    303 2000-01-01 07:00:00   1 FALSE
# 9  2000     93    303 2000-01-01 08:00:00   1 FALSE
# 10 2000     93    303 2000-01-01 09:00:00   1 FALSE

0 讨论(0)

时光取名叫无心

2020-12-18 13:04
dplyr is a great tool, but in this case I'm not sure it's the best for the job. This accomplishes your task:
```
span$CHECK <- ave(dst(span$date), as.Date(span$date, tz = tz(span$date)), FUN = any)
```
I think ave is a terrible name for this function, but if you can remember it exists, it's often quite useful when you want to join a summary back to the data.frame it came from.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2020-12-18 13:04
The best solution to get the job done, as suggested by @aosmith, is.
```
limits = span %>% group_by(YEAR) %>% mutate(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS]),CHECK=FALSE)

limits$CHECK[(limits2$DOY >= limits$minDOY) & (limits$DOY <= limits$maxDOY) ] = TRUE
```
The use of the ave function is a good choice, but I personally prefer to stick to the 'dplyr' package.
0 讨论(0)
发布评论:

提交评论
- 加载中...