问题
I have a template that I use to aggregate up my data from its source to get means and 95% confidence levels, in order to plot these in ggplot (originally adapted from a Stack Overflow post many years ago, apologies but I don't know the original source) that looks like:
data %>%
group_by(var1, var2) %>%
summarise(count=n(),
mean.outcome_variable = mean(outcome_variable, na.rm = TRUE),
sd.outcome_variable = sd(outcome_variable, na.rm = TRUE),
n.outcome_variable = n(),
total.outcome_variable = sum(outcome_variable)) %>%
mutate(se.outcome_variable = sd.outcome_variable / sqrt(n.outcome_variable),
lower.ci.outcome_variable = mean.outcome_variable - qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable,
upper.ci.outcome_variable = mean.outcome_variable + qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable)
This works well with one or two outcome variables but becomes infeasibly impractical to copy and paste with large numbers of outcome variables, so I was hoping to use summarise_if instead where I have large numbers of outcome variables which are all numeric. However I do not know how to specify anything more complex than a simple function such as "mean" or "sd" in the "funs" argument. I have tried gmodels::ci() as follows:
dataset_aggregated <- data %>%
group_by(var1, var2) %>%
summarise_if(is.numeric, funs(mean, lowCI = ci()[2], hiCI = ci()[3])) # does not work without brackets either
However this results in
Error in summarise_impl(.data, dots) :
Evaluation error: no applicable method for 'ci' applied to an object of class "NULL".
How do I get this to work?
回答1:
I worked out how to do this just as I got the question ready to post, but I thought I'd share in case anyone else was having the same issues as the answer is surprisingly simple and I can't believe it took me so long to think of it. Basically I just made custom lci() and uci() functions to separate out the results from gmodels::ci() and called these instead, e.g.
lci <- function(data) {
as.numeric(ci(data)[2])
}
uci <- function(data) {
as.numeric(ci(data)[3])
}
dataset_aggregated <- dataset %>%
group_by(var1, var2) %>% #you can group by however many you want here, just put them in the select statement below
summarise_if(is.numeric, funs(mean, lci, uci)) %>%
select(var1, var2, sort(current_vars())) #sorts columns into lci, mean, uci for each outcome variable alphabetically
来源:https://stackoverflow.com/questions/59287542/how-to-pass-more-complex-functions-to-summarise-if-or-mutate-if