Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

会有一股神秘感。 提交于 2019-12-09 10:01:47

问题


Suppose we have a character vector cols_to_select containing some columns we want to select from a dataframe df, e.g.

df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3)
cols_to_select <- c("b", "d")

Suppose also we want to use dplyr::select because it's part of an operation that uses %>% so using select makes the code easy to read.

There seem to be a number of ways which this can be achieved, but some are more robust than others. Please could you let me know which is the 'correct' version and why? Or perhaps there is another, better way?

dplyr::select(df, cols_to_select) #Fails if 'cols_to_select' happens to be the name of a column in df 
dplyr::select(df, !!cols_to_select) # i.e. using UQ()
dplyr::select(df, !!!cols_to_select) # i.e. using UQS()

cols_to_select_syms <- rlang::syms(c("b", "d"))  #See [here](https://stackoverflow.com/questions/44656993/how-to-pass-a-named-vector-to-dplyrselect-using-quosures/44657171#44657171)
dplyr::select(df, !!!cols_to_select_syms)

p.s. I realise this can be achieved in base R using simply df[,cols_to_select]


回答1:


There is an example with dplyr::select in https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html that uses:

dplyr::select(df, !!cols_to_select)

Why? Let's explore the options you mention:

Option 1

dplyr::select(df, cols_to_select)

As you say this fails if cols_to_select happens to be the name of a column in df, so this is wrong.

Option 4

cols_to_select_syms <- rlang::syms(c("b", "d"))  
dplyr::select(df, !!!cols_to_select_syms)

This looks more convoluted than the other solutions.

Options 2 and 3

dplyr::select(df, !!cols_to_select)
dplyr::select(df, !!!cols_to_select)

These two solutions provide the same results in this case. You can see the output of !!cols_to_select and !!!cols_to_select by doing:

dput(rlang::`!!`(cols_to_select)) # c("b", "d")
dput(rlang::`!!!`(cols_to_select)) # pairlist("b", "d")

The !! or UQ() operator evaluates its argument immediately in its context, and that is what you want.

The !!! or UQS() operator are used to pass multiple arguments at once to a function.

For character column names like in your example it does not matter if you give them as a single vector of length 2 (using !!) or as a list with two vectors of length one (using !!!). For more complex use cases you will need to use multiple arguments as a list: (using !!!)

a <- quos(contains("c"), dplyr::starts_with("b"))
dplyr::select(df, !!a) # does not work
dplyr::select(df, !!!a) # does work


来源:https://stackoverflow.com/questions/44739871/correct-usage-of-dplyrselect-in-dplyr-0-7-0-selecting-columns-using-characte

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!