问题
I want to run the same regression for different countries (i.e. subsets of my data). I did figure out how to do in R, but after doing the same thing with much more ease in Stata, I wonder if there's a better way in R.
In Stata you would do something like this:
foreach country in USA UK France {
reg y x1 x2 if country == "`country'"
}
Simple and human-readable, right? In R, I come up with split and ddply methods, both are more complicated. To use split
data.subset <- split(data, data$country)[c("USA", "UK", "France")]
res <- lapply(data.subset, function(subset) lm(y ~ x1 + x2, data=subset))
A more compact code would use ddply
. But in this case, the model will be run for all countries. Can I choose just a few?
ddply(data, "country", function(df) coefficients(lm(Y~X1+X2, data=df)))
But again, I'm interested in knowing whether there is an intuitive, readable for-loop like in Stata?
回答1:
There are several options:
One way using ddply
:
ddply( data[ data$country %in% c('USA','UK','France'), ], "country", function(df) coefficients(lm(Y~X1+X2, data=df)))
Using lapply
(or sapply
) a different way:
lapply( c("USA","UK","France"), function(curcont) lm(y ~ x1+x2, data=data, subset= country==curcont))
You could use the lmList
function from the nlme package.
You could use lm directly (though this will use a pooled estimate of the variance instead of separate ones):
lm( y ~ 0 + factor(country) * (x1 + x2), data=data, subset= country %in% c('USA','UK','France') )
There is also the by
function and for
loops and probably other options as well.
来源:https://stackoverflow.com/questions/18935293/regression-on-a-subset-in-r