Tidyeval with list of column names in a function

余生颓废 提交于 2019-12-29 08:20:26

问题


I am trying to create a function that passes a list of column names to a dplyr function. I know how to do this if the list of columns names is given in the ... form, as explained in the tidyeval documentation:

df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5), 
  b = sample(5)
)

my_summarise <- function(df, ...) {
  group_var <- quos(...)

  df %>%
    group_by(!!!group_var) %>%
    summarise(a = mean(a))
}

my_summarise(df, g1, g2)

But if I want to list the column names as an argument of the function, the above solution will not work (of course):

my_summarise <- function(df, group_var, sum_var) {
  group_var <- quos(group_var) # nor enquo(group_var)
  sum_var <- enquo(sum_var)

  df %>%
    group_by(!!!group_var) %>%
    summarise(a = mean(a))
}

my_summarise(df, list(g1, g2), a)
my_summarise(df, list(g1, g2), b)

How can I get the items inside the list to be quoted individually?

This question is similar to Passing dataframe column names in a function inside another function but in the comments it was suggested to use strings, while here I would like to use bare column names.


回答1:


You could pass your list of arguments using alist instead of list, as it won't evaluate the arguments.

my_summarise = function(df, group_var, sum_var) {
    group_var = quos(!!! group_var)
    sum_var = enquo(sum_var)

    df %>%
        group_by(!!! group_var) %>%
        summarise(!! quo_name( sum_var) := mean( !! sum_var) )
}

my_summarise(df, alist(g1, g2), b)

# A tibble: 4 x 3
# Groups:   g1 [?]
     g1    g2     b
  <dbl> <dbl> <dbl>
1     1     1   2.0
2     1     2   3.0
3     2     1   4.5
4     2     2   1.0

Another alternative would be to pass that argument directly with quos instead of list as shown in this answer, which bypasses some complications all together.

my_summarise = function(df, group_var, sum_var) {
    # group_var = quos(!!! group_var)
    sum_var = enquo(sum_var)

    df %>%
        group_by(!!! group_var) %>%
        summarise(!! quo_name( sum_var) := mean( !! sum_var) )
}

my_summarise(df, quos(g1, g2), b)

# A tibble: 4 x 3
# Groups:   g1 [?]
     g1    g2     b
  <dbl> <dbl> <dbl>
1     1     1   2.0
2     1     2   3.0
3     2     1   4.5
4     2     2   1.0



回答2:


library(dplyr)

df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5), 
  b = sample(5)
)

my_summarise = function(df, group_var, fun_name) {

  df %>%
    group_by(!!! group_var) %>%
    summarize_all(fun_name)
}

my_summarise(df, alist(g1, g2), mean)

alist() handles the arguments 'g1' and 'g2' as function arguments (does not evaluate them) while !!! (same as UQS() unquotes and splices the list. sum_var is not necessary as it looks like you want to take the mean of both 'a' and 'b'. Also, you can generalize it by passing in the function as well.



来源:https://stackoverflow.com/questions/47993471/tidyeval-with-list-of-column-names-in-a-function

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!