问题
I'm trying create a condition dummy (X) with the rule
set X=1 if Y=1 the last two years before the NA (only count once!).
To give an example: this is a sample from my data:
year country Y
1990 Bahamas 1
1991 Bahamas NA
1992 Bahamas NA
1993 Bahamas 0
1994 Bahamas 1
1995 Bahamas 1
1996 Bahamas NA
1997 Bahamas 1
1998 Bahamas NA
1999 Bahamas 1
2000 Bahamas NA
2001 Bahamas 1
2002 Bahamas 1
2003 Bahamas 0
2004 Bahamas NA
2005 Bahamas 0
2006 Bahamas 0
2007 Bahamas 1
2008 Bahamas NA
2009 Bahamas 1
2010 Bahamas 1
2011 Bahamas 1
And here is how the X dummy should look like:
year country Y X1
1990 Bahamas 1 1
1991 Bahamas NA 0
1992 Bahamas NA 0
1993 Bahamas 0 0
1994 Bahamas 1 1
1995 Bahamas 1 0
1996 Bahamas NA 0
1997 Bahamas 1 1
1998 Bahamas NA 0
1999 Bahamas 1 1
2000 Bahamas NA 0
2001 Bahamas 1 1
2002 Bahamas 1 0
2003 Bahamas 0 0
2004 Bahamas NA 0
2005 Bahamas 0 0
2006 Bahamas 0 0
2007 Bahamas 1 1
2008 Bahamas NA 0
2009 Bahamas 1 0
2010 Bahamas 1 0
2011 Bahamas 1 0
This is a bit too complicated for me. I've been reading about dplyr which seems to be a relevant package here. My readings has so far taken me to this cod
df %>% mutate(X=ifelse(Y >0) & lag(Y,2,))
I get the error:
argument "yes" is missing, with no default
Please tell me what am I doing wrong here. Should I put the "ifelse" before the "lag" as well?
Thanks.
回答1:
library(dplyr)
dat <- readr::read_table(
"year country Y
1990 Bahamas 1
1991 Bahamas NA
1992 Bahamas NA
1993 Bahamas 0
1994 Bahamas 1
1995 Bahamas 1
1996 Bahamas NA
1997 Bahamas 1
1998 Bahamas NA
1999 Bahamas 1
2000 Bahamas NA
2001 Bahamas 1
2002 Bahamas 1
2003 Bahamas 0
2004 Bahamas NA
2005 Bahamas 0
2006 Bahamas 0
2007 Bahamas 1
2008 Bahamas NA
2009 Bahamas 1
2010 Bahamas 1
2011 Bahamas 1
")
expected_output <- readr::read_table(
"year country Y X1
1990 Bahamas 1 1
1991 Bahamas NA 0
1992 Bahamas NA 0
1993 Bahamas 0 0
1994 Bahamas 1 1
1995 Bahamas 1 0
1996 Bahamas NA 0
1997 Bahamas 1 1
1998 Bahamas NA 0
1999 Bahamas 1 1
2000 Bahamas NA 0
2001 Bahamas 1 1
2002 Bahamas 1 0
2003 Bahamas 0 0
2004 Bahamas NA 0
2005 Bahamas 0 0
2006 Bahamas 0 0
2007 Bahamas 1 1
2008 Bahamas NA 0
2009 Bahamas 1 0
2010 Bahamas 1 0
2011 Bahamas 1 0
")
Identify the groups ending with NA
, find the position of the first 1
in the Y
column, create the X1
column with 1
s in found positions:
res <-
dat %>%
group_by(country) %>%
group_by(grp = cumsum(is.na(lag(Y))), add = TRUE) %>%
mutate(first_year_at_1 = match(1, Y) * any(is.na(Y)) * any(tail(Y, 3) == 1L),
X1 = {x <- integer(length(Y)) ; x[first_year_at_1] <- 1L ; x}) %>%
ungroup()
all.equal(select(res, -grp, -first_year_at_1), expected_output)
# [1] TRUE
(Note: if there are different countries in the real dataset, you might want to group by country
first to avoid undesirable effects at the junction of countries. I edited my answer accordingly).
回答2:
A solution can be found using dplyr
package. The approach is to create a group ending with NA
. Then the first
row with for a group having Y == 1
and that group's last Y
is NA then x1
is set as 1
otherwise X1
will be set as 0
.
library(dplyr)
df %>% group_by(Grp = cumsum(is.na(lag(Y)))) %>%
mutate(X1 = ifelse(row_number()== min(which(Y==1)) & is.na(last(Y)) , 1, 0 )) %>%
ungroup() %>%
select(-Grp) %>%
as.data.frame()
# year country Y X1
# 1 1990 Bahamas 1 1
# 2 1991 Bahamas NA 0
# 3 1992 Bahamas NA 0
# 4 1993 Bahamas 0 0
# 5 1994 Bahamas 1 1
# 6 1995 Bahamas 1 0
# 7 1996 Bahamas NA 0
# 8 1997 Bahamas 1 1
# 9 1998 Bahamas NA 0
# 10 1999 Bahamas 1 1
# 11 2000 Bahamas NA 0
# 12 2001 Bahamas 1 1
# 13 2002 Bahamas 1 0
# 14 2003 Bahamas 0 0
# 15 2004 Bahamas NA 0
# 16 2005 Bahamas 0 0
# 17 2006 Bahamas 0 0
# 18 2007 Bahamas 1 1
# 19 2008 Bahamas NA 0
# 20 2009 Bahamas 1 0
# 21 2010 Bahamas 1 0
# 22 2011 Bahamas 1 0
#
#
Data:
df <- read.table(text =
"year country Y
1990 Bahamas 1
1991 Bahamas NA
1992 Bahamas NA
1993 Bahamas 0
1994 Bahamas 1
1995 Bahamas 1
1996 Bahamas NA
1997 Bahamas 1
1998 Bahamas NA
1999 Bahamas 1
2000 Bahamas NA
2001 Bahamas 1
2002 Bahamas 1
2003 Bahamas 0
2004 Bahamas NA
2005 Bahamas 0
2006 Bahamas 0
2007 Bahamas 1
2008 Bahamas NA
2009 Bahamas 1
2010 Bahamas 1
2011 Bahamas 1",
header = TRUE, stringsAsFactors = FALSE)
来源:https://stackoverflow.com/questions/50556268/how-to-create-conditional-dummies-before-the-event-with-dplyr-in-r