is ifelse ever appropriate in a non-vectorized situation and vice-versa?

前端 未结 2 1254
萌比男神i
萌比男神i 2020-12-19 03:24

(Background info: ifelse evaluates both of the expressions, even though only one will be returned. EDIT: This is an incorrect statement. See Tommy\'s r

相关标签:
2条回答
  • 2020-12-19 03:57

    First, ifelse does NOT always evaluate both expressions - only if there are both TRUE and FALSE elements in the test vector.

    ifelse(TRUE, 'foo', stop('bar')) # "foo"
    

    And in my opinion:

    ifelse should not be used in a non-vectorized situation. It is always slower and more error prone to use ifelse over if / else:

    # This is fairly common if/else code
    if (length(letters) > 0) letters else LETTERS
    
    # But this "equivalent" code will yield a very different result - TRY IT!
    ifelse(length(letters) > 0, letters, LETTERS)
    

    In vectorized situations though, ifelse can be a good choice - but beware that the length and attributes of the result might not be what you expect (as above, and I consider ifelse broken in that respect).

    Here's an example: tst is of length 5 and has a class. I'd expect the result to be of length 10 and have no class, but that isn't what happens - it gets an incompatible class and length 5!

    # a logical vector of class 'mybool'
    tst <- structure(1:5 %%2 > 0, class='mybool')
    
    # produces a numeric vector of class 'mybool'!
    ifelse(tst, 101:110, 201:210)
    #[1] 101 202 103 204 105
    #attr(,"class")
    #[1] "mybool"
    

    Why would I expect the length to be 10? Because most functions in R "cycle" the shorter vector to match the longer:

    1:5 + 1:10 # returns a vector of length 10.
    

    ...But ifelse only cycles the yes/no arguments to match the length of the tst argument.

    Why would I expect the class (and other attributes) to not be copied from the test object? Because < which returns a logical vector does not copy class and attributes from its (typically numeric) arguments. It doesn't do that because it would typically be very wrong.

    1:5 < structure(1:10, class='mynum') # returns a logical vector without class
    

    Finally, can it be more efficient to "do it yourself"? Well, it seems that ifelse is not a primitive like if, and it needs some special code to handle NA. If you don't have NAs, it can be faster to do it yourself.

    tst <- 1:1e7 %%2 == 0
    a <- rep(1, 1e7)
    b <- rep(2, 1e7)
    system.time( r1 <- ifelse(tst, a, b) )            # 2.58 sec
    
    # If we know that a and b are of the same length as tst, and that
    # tst doesn't have NAs, then we can do like this:
    system.time( { r2 <- b; r2[tst] <- a[tst]; r2 } ) # 0.46 secs
    
    identical(r1, r2) # TRUE
    
    0 讨论(0)
  • 2020-12-19 03:59

    On your second point, how do you define "best"? I think ifelse() is one of the more readable solutions, but may not always be the fastest. Specifically, I've found that writing out boolean conditions and adding them together can give you some performance benefits. Here's a quick example:

    > x <- rnorm(1e6)
    > system.time(y1 <- ifelse(x > 0,1,2))
       user  system elapsed 
       0.46    0.08    0.53 
    > system.time(y2 <- (x > 0) * 1 + (x <= 0) * 2)
       user  system elapsed 
       0.06    0.00    0.06 
    > identical(y1, y2)
    [1] TRUE
    

    So, if speed is your biggest concern, the boolean approach may be better. However, for most of my purposes - I've found ifelse() quick enough and is easy to grok. Your miles may vary obviously.

    0 讨论(0)
提交回复
热议问题