问题
I have a variable name family that only changes in the last four positions (years) and I would like to create variables addressing this family all at once.
In Stata I would simply do this:
forvalues n=1991(1)1995 {
gen comp`n’== (year_begin<`n’ & (year_end>`n’ | year_end==.))
}
Here’s what I’m doing in R:
data$comp1991<-ifelse(year(data$date_begin)<1991 & (year(data$date_end)>1991|is.na(data$date_end)),1,0)
data$comp1992<-ifelse(year(data$date_begin)<1992 & (year(data$date_end)>1992|is.na(data$date_end)),1,0)
data$comp1993<-ifelse(year(data$date_begin)<1993 & (year(data$date_end)>1993|is.na(data$date_end)),1,0)
data$comp1994<-ifelse(year(data$date_begin)<1994 & (year(data$date_end)>1994|is.na(data$date_end)),1,0)
data$comp1995<-ifelse(year(data$date_begin)<1995 & (year(data$date_end)>1995|is.na(data$date_end)),1,0)
So in Stata, I only have really one line of code, whereas in R, I need to repeat this line over and over, changing the `n’ manually.
Is there a way to do this more efficiently in R? (I am thinking some combination of a loop with eval(parse()) but not sure. Any ideas will be appreciated:
回答1:
To elaborate on some of the comments, closest equivalent of the Stata loop you provided would be:
for(n in seq(1991, 1995)) {
data[[paste0('comp', n)]] <- year(data$date_begin)<1991 & (year(data$date_end)>1991 | is.na(data$date_end))
}
The conditional statement will return zero and one in Stata, but FALSE and TRUE in R. There's no practical difference between the two though; you can still operate on them the same.
If you want to make the loop even more similar to the Stata code, you could clean up some of the repetitive references to the object data
by using the data.table
package:
library(data.table)
data <- data.table(data)
for(n in seq(1991, 1995)) {
data[, paste0('comp',n) := year(date_begin)<1991 & (year(date_end)>1991 | is.na(date_end)]
}
来源:https://stackoverflow.com/questions/35439914/parse-variable-names-in-r-vs-stata