`dplyr::case_when` Evaluation error: object 'x' not found

问题

Does anyone know why dplyr::case_when() produces the error in the following code?

tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  mutate(tmp = apply(cbind(tmp1, tmp2, tmp3), 1, function(x) {
    case_when(
      all(x == F) ~ "N",
      any(x == T) ~ "Y"
    )
  }))

Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'x' not found.

I am using R 3.4.3 with dplyr 0.7.4 on Ubuntu 16.04.

The error message is quite confusing, since the following code works fine, which indicates that x is not missing:

tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  mutate(tmp = apply(cbind(tmp1, tmp2, tmp3), 1, function(x) {
    if (all(x == F)) {
      "N"
    } else if(any(x == T)) {
      "Y"
    }
  }))

Just for reference, the following code also works fine:

cbind(tmp1 = sample(c(T, F), size = 32, replace = T),
      tmp2 = sample(c(T, F), size = 32, replace = T),
      tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  apply(1, function(x) {
    case_when(
      all(x == F) ~ "N",
      any(x == T) ~ "Y"
    )
  })

回答1:

The issue is case_when does not do row-wise operation. However, we can simplify the code by using rowSums (which conducts row-wise operation) and case_when.

library(dplyr)

set.seed(151)

tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  mutate(tmp = case_when(
      rowSums(.) == 0   ~"N",
      rowSums(.) > 0    ~"Y" 
    ))

# # A tibble: 32 x 4
#   tmp1  tmp2  tmp3  tmp  
#   <lgl> <lgl> <lgl> <chr>
#  1 TRUE  TRUE  FALSE Y    
#  2 FALSE FALSE TRUE  Y    
#  3 FALSE FALSE TRUE  Y    
#  4 FALSE FALSE TRUE  Y    
#  5 TRUE  FALSE FALSE Y    
#  6 FALSE FALSE FALSE N    
#  7 TRUE  FALSE FALSE Y    
#  8 FALSE TRUE  FALSE Y    
#  9 TRUE  TRUE  FALSE Y    
# 10 FALSE FALSE TRUE  Y    
# # ... with 22 more rows

Or since there are only two conditions, rowSums with ifelse should be fine.

set.seed(151)

tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  mutate(tmp = ifelse(rowSums(.) == 0, "N", "Y"))
# # A tibble: 32 x 4
#   tmp1  tmp2  tmp3  tmp  
#   <lgl> <lgl> <lgl> <chr>
#  1 TRUE  TRUE  FALSE Y    
#  2 FALSE FALSE TRUE  Y    
#  3 FALSE FALSE TRUE  Y    
#  4 FALSE FALSE TRUE  Y    
#  5 TRUE  FALSE FALSE Y    
#  6 FALSE FALSE FALSE N    
#  7 TRUE  FALSE FALSE Y    
#  8 FALSE TRUE  FALSE Y    
#  9 TRUE  TRUE  FALSE Y    
# 10 FALSE FALSE TRUE  Y    
# # ... with 22 more rows

回答2:

How about using Reduce and logical OR?

set.seed(151);
tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
    mutate(tmp = Reduce(`|`, list(tmp1, tmp2, tmp3)))
## A tibble: 32 x 4
#   tmp1  tmp2  tmp3  tmp
#   <lgl> <lgl> <lgl> <lgl>
# 1 TRUE  TRUE  FALSE TRUE
# 2 FALSE FALSE TRUE  TRUE
# 3 FALSE FALSE TRUE  TRUE
# 4 FALSE FALSE TRUE  TRUE
# 5 TRUE  FALSE FALSE TRUE
# 6 FALSE FALSE FALSE FALSE
# 7 TRUE  FALSE FALSE TRUE
# 8 FALSE TRUE  FALSE TRUE
# 9 TRUE  TRUE  FALSE TRUE
#10 FALSE FALSE TRUE  TRUE
## ... with 22 more rows

回答3:

As it turns out, this is a bug, probably related to the hybrid evaluator: https://github.com/tidyverse/dplyr/issues/3422

来源：https://stackoverflow.com/questions/49268989/dplyrcase-when-evaluation-error-object-x-not-found

标签

dplyr

tidyverse