问题
I'm summarizing a data frame in dplyr with the summarize_all() function. If I do the following:
summarize_all(mydf, list(mean="mean", median="median", sd="sd"))
I get a tibble with 3 variables for each of my original measures, all suffixed by the type (mean, median, sd). Great! But when I try to capture the within-vector n's to calculate the standard deviations myself and to make sure missing cells aren't counted...
summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="n"))
...I get an error:
Error in (function () : unused argument (var_a)
This is not an issue with my var_a vector. If I remove it, I get the same error for var_b, etc. The summarize_all function is producing odd results whenever I request n or n(), or if I use .funs() and list the descriptives I want to compute instead.
What's going on?
回答1:
The reason it's giving you problems is because n() doesn't take any arguments, unlike mean() and median(). Use length() instead to get the desired effect:
summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="length"))
回答2:
Here, we can use the ~ if we want to have finer control, i.e. adding other parameters
library(dplyr)
mtcars %>%
summarise_all(list(mean = ~ mean(.), median = ~median(.), n = ~ n()))
However, getting the n() for each column is not making much sense as it would be the same. Instead create the n() before doing the summarise
mtcars %>%
group_by(n = n()) %>%
summarise_all(list(mean = mean, median = median))
Otherwise, just pass the unquoted function
mtcars %>%
summarise_all(list(mean = mean, median = median))
来源:https://stackoverflow.com/questions/58068522/summarize-all-with-n-function