How to parametrize function calls in dplyr 0.7?

五迷三道 提交于 2019-11-27 01:02:54

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
    group_by_at(.vars = cols) %>%
    summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
# 
# am  gear mean_cyl
# <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars

A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.

Here's the quick and dirty reference I wrote for myself.

# install.packages("rlang")
library(tidyverse)

dat <- data.frame(cat = sample(LETTERS[1:2], 50, replace = TRUE),
                  cat2 = sample(LETTERS[3:4], 50, replace = TRUE),
                  value = rnorm(50))

Representing column names with strings

Convert strings to symbol objects using rlang::sym and rlang::syms.

summ_var <- "value"
group_vars <- c("cat", "cat2")

summ_sym <- rlang::sym(summ_var)  # capture a single symbol
group_syms <- rlang::syms(group_vars)  # creates list of symbols

dat %>%
  group_by(!!!group_syms) %>%  # splice list of symbols into a function call
  summarize(summ = sum(!!summ_sym)) # slice single symbol into call

If you use !! or !!! outside of dplyr functions you will get an error.

The usage of rlang::sym and rlang::syms is identical inside functions.

summarize_by <- function(df, summ_var, group_vars) {

  summ_sym <- rlang::sym(summ_var)
  group_syms <- rlang::syms(group_vars)

  df %>%
    group_by(!!!group_syms) %>%
    summarize(summ = sum(!!summ_sym))
}

We can then call summarize_by with string arguments.

summarize_by(dat, "value", c("cat", "cat2"))

Using non-standard evaluation for column/variable names

summ_quo <- quo(value)  # capture a single variable for NSE
group_quos <- quos(cat, cat2)  # capture list of variables for NSE

dat %>%
  group_by(!!!group_quos) %>%  # use !!! with both quos and rlang::syms
  summarize(summ = sum(!!summ_quo))  # use !! both quo and rlang::sym

Inside functions use enquo rather than quo. quos is okay though!?

summarize_by <- function(df, summ_var, ...) {

  summ_quo <- enquo(summ_var)  # can only capture a single value!
  group_quos <- quos(...)  # captures multiple values, also inside functions!?

  df %>%
    group_by(!!!group_quos) %>%
    summarize(summ = sum(!!summ_quo))
}

And then our function call is

summarize_by(dat, value, cat, cat2)

If you want to group by possibly more than one column, you can use quos

grouping_vars <- quos(am, gear)
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

Right now, it doesn't seem like there's a great way to turn strings into quos. Here's one way that does work though

cols <- c("am","gear")
grouping_vars <- rlang::parse_quosures(paste(cols, collapse=";"))
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!