I\'m using the dplyr
package (dplyr
0.4.3; R 3.2.3) for basic summary of grouped data (summarise
), but get inconsistent results (NaN f
The transformations you specify in summarize
are performed in the order they appear, that means if you change variable values, then those new values appear for the subsequent columns (this is different from the base function tranform()
). When you do
df %>% group_by(time) %>%
summarise(glucose=mean(glucose, na.rm=TRUE),
glucose.sd=sd(glucose, na.rm=TRUE),
n=sum(!is.na(glucose)))
The glucose=mean(glucose, na.rm=TRUE)
part has changed the value of the glucose
variable such that when you calculate the glucose.sd=sd(glucose, na.rm=TRUE)
part, the sd()
does not see the original glucose values, it see the new value that is the mean of the original values. If you re-order the columns, it will work.
df %>% group_by(time) %>%
summarise(glucose.sd=sd(glucose, na.rm=TRUE),
n=sum(!is.na(glucose)),
glucose=mean(glucose, na.rm=TRUE))
If you are wondering why this is the default behavior, this is because it is often nice to create a column and then use that column value later in the transformations. For example, with mutate()
df %>% group_by(time) %>%
mutate(glucose_sq = glucose^2,
glucose_sq_plus2 = glucose_sq+2)