问题
I'm trying to use mutate to create a column that takes the value of one column up to a point and then uses cumprod
to fill the rest of the observations based on the values of another column.
I tried combining mutate
with ifelse
but the order of the statements is not correct and I can't figure out why
Below I reproduce a more basic example that replicates my problem:
foo1 <- data.frame(date=seq(2005,2018,1))
foo1 %>% mutate(h=ifelse(date>2008, seq(1,11,1), 99))
The output is:
date h
1 2005 99
2 2006 99
3 2007 99
4 2008 99
5 2009 5
6 2010 6
7 2011 7
8 2012 8
9 2013 9
10 2014 10
11 2015 1
12 2016 2
13 2017 3
14 2018 4
And I'd like it to be:
date h
1 2005 99
2 2006 99
3 2007 99
4 2008 99
5 2009 1
6 2010 2
7 2011 3
8 2012 4
9 2013 5
10 2014 6
11 2015 7
12 2016 8
13 2017 9
14 2018 10
Edit:
Below I reproduce another example (more close to what I'm trying to do).
foo2 <- data.frame(date=seq(2005,2013,1), a=seq(1, by=1, length.out = 9), b=rep(1.01, length.out = 9))
foo2 %>% mutate(h=ifelse(date>2008, cumprod(c(a[5],b[5:9])), a))
The output I have is:
date a b h
1 2005 1 1.01 1.00000
2 2006 2 1.01 2.00000
3 2007 3 1.01 3.00000
4 2008 4 1.01 4.00000
5 2009 5 1.01 5.20302
6 2010 6 1.01 5.25505
7 2011 7 1.01 5.00000
8 2012 8 1.01 5.05000
9 2013 9 1.01 5.10050
And I'd like it to be:
date a b h
1 2005 1 1.01 1.00000
2 2006 2 1.01 2.00000
3 2007 3 1.01 3.00000
4 2008 4 1.01 4.00000
5 2009 5 1.01 5.00000
6 2010 6 1.01 5.05000
7 2011 7 1.01 5.10050
8 2012 8 1.01 5.20302
9 2013 9 1.01 5.25505
If I use if_else instead of ifelse
, I receive the following error:
Error in mutate_impl(.data, dots) :
Evaluation error: `true` must be length 9 (length of `condition`) or one, not 6
回答1:
You were nearly there:
foo1 %>% mutate(h = if_else(date > 2008, cumsum(date > 2008), 99L))
# date h
#1 2005 99
#2 2006 99
#3 2007 99
#4 2008 99
#5 2009 1
#6 2010 2
#7 2011 3
#8 2012 4
#9 2013 5
#10 2014 6
#11 2015 7
#12 2016 8
#13 2017 9
#14 2018 10
PS. It's recommended to use if_else instead of base R's ifelse.
回答2:
The ifelse
function takes three arguments:
test
: alogical
vector. Say that it has a length ofN
.yes
: a vector. It can be of any length. If the length is notN
, the vector is recycled/shortened to be of lengthN
no
: same asyes
.
At the end of this preprocessing stage, you have 3 same length vectors. ifelse
then builds the return value selecting the second vector or the third vector depending on test
.
In your case we have:
test <- foo1$date>2008 #length: 14
yes <- seq(1,11,1) #length: 11
no <- 99 #length: 1
So, it needs to recycle both yes
and no
. You end up with something like:
test yes no
FALSE 1 99
FALSE 2 99
FALSE 3 99
FALSE 4 99
TRUE 5 99
TRUE 6 99
TRUE 7 99
TRUE 8 99
TRUE 9 99
TRUE 10 99
TRUE 11 99
TRUE 1 99
TRUE 2 99
TRUE 3 99
You see how the recycle works. Then, to build the return value, ifelse
selects, in the order above, yes
elements if test
is TRUE
and no
elements otherwise. This explain why you have that return value. It's not about dplyr
of course.
来源:https://stackoverflow.com/questions/53520036/ordering-problems-when-using-mutate-with-ifelse-condition-to-date