(Background info: ifelse
evaluates both of the expressions, even though only one will be returned. EDIT: This is an incorrect statement. See Tommy\'s r
First, ifelse
does NOT always evaluate both expressions - only if there are both TRUE
and FALSE
elements in the test vector.
ifelse(TRUE, 'foo', stop('bar')) # "foo"
And in my opinion:
ifelse
should not be used in a non-vectorized situation. It is always slower and more error prone to use ifelse
over if
/ else
:
# This is fairly common if/else code
if (length(letters) > 0) letters else LETTERS
# But this "equivalent" code will yield a very different result - TRY IT!
ifelse(length(letters) > 0, letters, LETTERS)
In vectorized situations though, ifelse
can be a good choice - but beware that the length and attributes of the result might not be what you expect (as above, and I consider ifelse
broken in that respect).
Here's an example: tst
is of length 5 and has a class. I'd expect the result to be of length 10 and have no class, but that isn't what happens - it gets an incompatible class and length 5!
# a logical vector of class 'mybool'
tst <- structure(1:5 %%2 > 0, class='mybool')
# produces a numeric vector of class 'mybool'!
ifelse(tst, 101:110, 201:210)
#[1] 101 202 103 204 105
#attr(,"class")
#[1] "mybool"
Why would I expect the length to be 10? Because most functions in R "cycle" the shorter vector to match the longer:
1:5 + 1:10 # returns a vector of length 10.
...But ifelse
only cycles the yes/no arguments to match the length of the tst argument.
Why would I expect the class (and other attributes) to not be copied from the test object? Because <
which returns a logical vector does not copy class and attributes from its (typically numeric) arguments. It doesn't do that because it would typically be very wrong.
1:5 < structure(1:10, class='mynum') # returns a logical vector without class
Finally, can it be more efficient to "do it yourself"? Well, it seems that ifelse
is not a primitive like if
, and it needs some special code to handle NA
. If you don't have NA
s, it can be faster to do it yourself.
tst <- 1:1e7 %%2 == 0
a <- rep(1, 1e7)
b <- rep(2, 1e7)
system.time( r1 <- ifelse(tst, a, b) ) # 2.58 sec
# If we know that a and b are of the same length as tst, and that
# tst doesn't have NAs, then we can do like this:
system.time( { r2 <- b; r2[tst] <- a[tst]; r2 } ) # 0.46 secs
identical(r1, r2) # TRUE
On your second point, how do you define "best"? I think ifelse()
is one of the more readable solutions, but may not always be the fastest. Specifically, I've found that writing out boolean conditions and adding them together can give you some performance benefits. Here's a quick example:
> x <- rnorm(1e6)
> system.time(y1 <- ifelse(x > 0,1,2))
user system elapsed
0.46 0.08 0.53
> system.time(y2 <- (x > 0) * 1 + (x <= 0) * 2)
user system elapsed
0.06 0.00 0.06
> identical(y1, y2)
[1] TRUE
So, if speed is your biggest concern, the boolean approach may be better. However, for most of my purposes - I've found ifelse()
quick enough and is easy to grok. Your miles may vary obviously.