Ordering problems when using mutate with ifelse condition to date

北战南征 提交于 2019-12-13 03:55:47

问题


I'm trying to use mutate to create a column that takes the value of one column up to a point and then uses cumprod to fill the rest of the observations based on the values of another column.

I tried combining mutate with ifelse but the order of the statements is not correct and I can't figure out why

Below I reproduce a more basic example that replicates my problem:

foo1 <- data.frame(date=seq(2005,2018,1))
foo1 %>% mutate(h=ifelse(date>2008, seq(1,11,1), 99))

The output is:

   date  h
1  2005 99
2  2006 99
3  2007 99
4  2008 99
5  2009  5
6  2010  6
7  2011  7
8  2012  8
9  2013  9
10 2014 10
11 2015  1
12 2016  2
13 2017  3
14 2018  4

And I'd like it to be:

   date  h
1  2005 99
2  2006 99
3  2007 99
4  2008 99
5  2009  1
6  2010  2
7  2011  3
8  2012  4
9  2013  5
10 2014  6
11 2015  7
12 2016  8
13 2017  9
14 2018 10

Edit:

Below I reproduce another example (more close to what I'm trying to do).

foo2 <- data.frame(date=seq(2005,2013,1), a=seq(1, by=1, length.out = 9), b=rep(1.01, length.out = 9))
foo2 %>% mutate(h=ifelse(date>2008, cumprod(c(a[5],b[5:9])), a))

The output I have is:

  date a    b       h
1 2005 1 1.01 1.00000
2 2006 2 1.01 2.00000
3 2007 3 1.01 3.00000
4 2008 4 1.01 4.00000
5 2009 5 1.01 5.20302
6 2010 6 1.01 5.25505
7 2011 7 1.01 5.00000
8 2012 8 1.01 5.05000
9 2013 9 1.01 5.10050

And I'd like it to be:

  date a    b       h
1 2005 1 1.01 1.00000
2 2006 2 1.01 2.00000
3 2007 3 1.01 3.00000
4 2008 4 1.01 4.00000
5 2009 5 1.01 5.00000
6 2010 6 1.01 5.05000
7 2011 7 1.01 5.10050
8 2012 8 1.01 5.20302
9 2013 9 1.01 5.25505

If I use if_else instead of ifelse, I receive the following error:

Error in mutate_impl(.data, dots) : 
  Evaluation error: `true` must be length 9 (length of `condition`) or one, not 6

回答1:


You were nearly there:

foo1 %>% mutate(h = if_else(date > 2008, cumsum(date > 2008), 99L))
#   date  h
#1  2005 99
#2  2006 99
#3  2007 99
#4  2008 99
#5  2009  1
#6  2010  2
#7  2011  3
#8  2012  4
#9  2013  5
#10 2014  6
#11 2015  7
#12 2016  8
#13 2017  9
#14 2018 10

PS. It's recommended to use if_else instead of base R's ifelse.




回答2:


The ifelse function takes three arguments:

  1. test: a logical vector. Say that it has a length of N.
  2. yes: a vector. It can be of any length. If the length is not N, the vector is recycled/shortened to be of length N
  3. no: same as yes.

At the end of this preprocessing stage, you have 3 same length vectors. ifelse then builds the return value selecting the second vector or the third vector depending on test.

In your case we have:

test <- foo1$date>2008 #length: 14
yes <- seq(1,11,1) #length: 11
no <- 99 #length: 1

So, it needs to recycle both yes and no. You end up with something like:

 test yes no
FALSE   1 99
FALSE   2 99
FALSE   3 99
FALSE   4 99
 TRUE   5 99
 TRUE   6 99
 TRUE   7 99
 TRUE   8 99
 TRUE   9 99
 TRUE  10 99
 TRUE  11 99
 TRUE   1 99
 TRUE   2 99
 TRUE   3 99

You see how the recycle works. Then, to build the return value, ifelse selects, in the order above, yes elements if test is TRUE and no elements otherwise. This explain why you have that return value. It's not about dplyr of course.



来源:https://stackoverflow.com/questions/53520036/ordering-problems-when-using-mutate-with-ifelse-condition-to-date

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!