Creating a function with an argument passed to dplyr::filter what is the best way to work around nse?

后端 未结 3 955
陌清茗
陌清茗 2020-12-30 10:13

Non standard evaluation is really handy when using dplyr\'s verbs. But it can be problematic when using those verbs with function arguments. For example let us say that I

相关标签:
3条回答
  • 2020-12-30 10:49

    The answer from @eddi is correct about what's going on here. I'm writing another answer that addresses the larger request of how to write functions using dplyr verbs. You'll note that, ultimately, it uses something like nrowspecies2 to avoid the species == species tautology.

    To write a function wrapping dplyr verb(s) that will work with NSE, write two functions:

    First write a version that requires quoted inputs, using lazyeval and an SE version of the dplyr verb. So in this case, filter_.

    nrowspecies_robust_ <- function(data, species){ 
      species_ <- lazyeval::as.lazy(species) 
      condition <- ~ species == species_ # *
      tmp <- dplyr::filter_(data, condition) # **
      nrow(tmp)
    } 
    nrowspecies_robust_(iris, ~versicolor) 
    

    Second make a version that uses NSE:

    nrowspecies_robust <- function(data, species) { 
      species <- lazyeval::lazy(species) 
      nrowspecies_robust_(data, species) 
    } 
    nrowspecies_robust(iris, versicolor) 
    

    * = if you want to do something more complex, you may need to use lazyeval::interp here as in the tips linked below

    ** = also, if you need to change output names, see the .dots argument

    • For the above, I followed some tips from Hadley

    • Another good resource is the dplyr vignette on NSE, which illustrates .dots, interp, and other functions from the lazyeval package

    • For even more details on lazyeval see it's vignette

    • For a thorough discussion of the base R tools for working with NSE (many of which lazyeval helps you avoid), see the chapter on NSE in Advanced R

    0 讨论(0)
  • 2020-12-30 10:51

    This question has absolutely nothing to do with non standard evaluation. Let me rewrite your initial function to make that clear:

    nrowspecies4 <- function(dtf, boo){
        dtf %>%
            filter(boo == boo) %>%
            nrow()
    }
    nrowspecies4(iris, boo = "versicolor")
    #150
    

    The expression inside your filter always evaluates to TRUE (almost always - see example below), that's why it doesn't work, not because of some NSE magic.

    Your nrowspecies2 is the way to go.

    Fwiw, species in your nrowspecies0 is indeed evaluated as a column, not as the input variable species, and you can check that by comparing nrowspecies0(iris, NA) to nrowspecies4(iris, NA).

    0 讨论(0)
  • 2020-12-30 11:08

    in his 2016 UseR talk (@38min30s), Hadley Wickham explains the concept of referential transparency . Using a formula, the filter function can be reformulated as:

    nrowspecies5 <- function(dtf, formula){
        dtf %>%
            filter_(formula) %>%
            nrow()
    }
    

    This has the added benefit of beeing more generic

    # Make column names lower case
    names(iris) = tolower(names(iris)) 
    nrowspecies5(iris, ~ species == "versicolor")
    # 50
    nrowspecies5(iris, ~ sepal.length > 6 & species == "virginica")
    # 41
    nrowspecies5(iris, ~ sepal.length > 6 & species == "setosa")
    # 0
    
    0 讨论(0)
提交回复
热议问题