I have a large dataset with several variables, one of which is a state variable, coded 1-50 for each state. I\'d like to run a regression of 28 variables on the remaining 2
This is another example of the classic Split-Apply-Combine problem, which can be addressed using the plyr package by @hadley. In your problem, you want to
I will illustrate it with the Cars93 dataset available in MASS library. We are interested in figuring out the relationship between horsepower and enginesize based on origin of country.
# LOAD LIBRARIES
require(MASS); require(plyr)
# SPLIT-APPLY-COMBINE
regressions <- dlply(Cars93, .(Origin), lm, formula = Horsepower ~ EngineSize)
coefs <- ldply(regressions, coef)
Origin (Intercept) EngineSize
1 USA 33.13666 37.29919
2 non-USA 15.68747 55.39211
EDIT. For your example, substitute PUF for Cars93, state for Origin and fm for the formula