问题
I am using ddply
to execute glm
on subsets of my data. I am having difficulty accessing the estimated Y values. I am able to get the model parameter estimates using the below code, but all the variations I've tried to get the fitted values have fallen short. The dependent and independent variables in the glm
model are column vectors, as is the "Dmsa"
variable used in the ddply
operation.
Define the model:
Model <- function(df){coef(glm(Y~D+O+B+A+log(M), family=poisson(link="log"), data=df))}
Execute the model on subsets:
Modrpt <- ddply(msadata, "Dmsa", Model)
Print Modrpt
gives the model coefficients, but no Y estimates.
I know that if I wasn't using ddply
, I can access the glm
estimated Y values by using the code:
Model <- glm(Y~D+O+B+A+log(M), family=poisson(link="log"), data=msadata)
fits <- Model$fitted.values
I have tried both of the following to get the fitted values for the subsets, but no luck:
fits <- fitted.values(ddply(msadata, "Dmsa", Model))
fits <- ddply(msadata, "Dmsa", fitted.values(Model))
I'm sure this is a very easy to code...unfortunately, I'm just learning R. Does anyone know where I am going wrong?
回答1:
You can use an anonymous function in your call to ddply
e.g.
require(plyr)
data(iris)
model <- function(df){
lm( Petal.Length ~ Sepal.Length + Sepal.Width , data = df )
}
ddply( iris , "Species" , function(x) fitted.values( model(x) ) )
This has the advantage that you can also, without rewriting your model
function, get thecoef
values by doing
ddply( iris , "Species" , function(x) coef( model(x) ) )
As @James points out, this will fall down if you have splits of unequal size, better to use dlply
which puts the result of each subset in it's own list element.
(I make no claims for statistical relevance or correctness of the example model - it is just an example)
回答2:
I'd recommending doing this in two steps:
library(plyr)
# First first the models
models <- dlply(iris, "Species", lm,
formula = Petal.Length ~ Sepal.Length + Sepal.Width )
# Next, extract the fitted values
ldply(models, fitted.values)
# Or maybe
ldply(models, as.data.frame(fitted.values))
来源:https://stackoverflow.com/questions/18082160/accessing-fitted-values-when-using-ddply