Regression on a subset in R

问题

I want to run the same regression for different countries (i.e. subsets of my data). I did figure out how to do in R, but after doing the same thing with much more ease in Stata, I wonder if there's a better way in R.

In Stata you would do something like this:

foreach country in USA UK France {
    reg y x1 x2 if country == "`country'"
}

Simple and human-readable, right? In R, I come up with split and ddply methods, both are more complicated. To use split

data.subset <- split(data, data$country)[c("USA", "UK", "France")]
res <- lapply(data.subset, function(subset) lm(y ~ x1 + x2, data=subset))

A more compact code would use ddply. But in this case, the model will be run for all countries. Can I choose just a few?

ddply(data, "country", function(df) coefficients(lm(Y~X1+X2, data=df)))

But again, I'm interested in knowing whether there is an intuitive, readable for-loop like in Stata?

回答1:

There are several options:

One way using ddply:

ddply( data[ data$country %in% c('USA','UK','France'), ], "country", function(df) coefficients(lm(Y~X1+X2, data=df)))

Using lapply (or sapply) a different way:

lapply( c("USA","UK","France"), function(curcont) lm(y ~ x1+x2, data=data, subset= country==curcont))

You could use the lmList function from the nlme package.

You could use lm directly (though this will use a pooled estimate of the variance instead of separate ones):

lm( y ~ 0 + factor(country) * (x1 + x2), data=data, subset= country %in% c('USA','UK','France') )

There is also the by function and for loops and probably other options as well.

来源：https://stackoverflow.com/questions/18935293/regression-on-a-subset-in-r

标签

loops

subset

stata